WIVET—基准覆盖 Web爬虫的机制外文翻译资料-外文翻译网

外文文献：

WIVET—Benchmarking Coverage

Qualities of Web Crawlers

1. MOTIVATION

Web applications having serious vulnerabilities (e.g. SQL injection, Cross-Site Scripting, insufficient session management and Cross-Site Request Forgery) [1] have become one of the most critical issues regarding cyberthreats on the Internet. According to the findings of several security penetration tests performed in 2014, each tested 9 websites out of 10 have one or more serious security vulnerabilities [2]. These statistics show that web applications are still insecure and web application security must be taken very seriously into consideration. Since it is inevitable to perform security tests and find out security vulnerabilities of web applications before they are revealed and abused by black-hat hackers, organizations and enterprises today integrate security testing into their development life cycles [3].

There exist different approaches for security testing [4]. In black-box security testing, security testers are given only a web page URL or IP address and pretend to be an external hacker. However, security testers performing a white-box security test have additionally access to source code of web application, credentials, architecture details and documentation. There exists a hybrid type of security testing which is called gray-box testing. In gray-box testing, security testers have the opportunity to access more details than just target URL or IP address; such as credentials, business flow documentation but not source code.

Security tests may be categorized as static or dynamic tests from the perspective of automatic tool usage. During static (in other words manual) black-box tests, security testers try to detect vulnerabilities manually without using an automated scanning tool. For example, user input parameters can be manually manipulated to find SQL injection or XSS vulnerabilities.

During static white-box tests, which are called code review, security testers analyze critical methods and especially input processing points in source code. During dynamic black-box tests, known as vulnerability scanning, automated WAVS tools are used to detect vulnerabilities. In dynamic white-box tests, which is called as source code scanning, detection tools are used to analyze automatically and find vulnerabilities directly in source code.

Each type of security tests provides a positive contribution for securing web applications. Automated dynamic tests are very fast and can cover more areas of an application compared to what a person can achieve manually. However, human intelligence is the key element in static tests and a person can reveal serious security weaknesses (e.g. business logic flaws) which cannot be possibly detected by automated scanning tools. Black-box security tests are simple to deploy and therefore appropriate for time-lacking projects, whereas white-box tests take longer time but can reveal more critical vulnerabilities. WAVS tools are used for dynamic black-box tests, since they can be deployed easily, produce valuable results very fast and have capability of testing several applications in parallel. There exist both closed-source (e.g. IBM AppScan [5], Netsparker [6] and Acunetix [7]) and open-source (e.g. BurpSuite [8], OWASP ZAP [9], w3af [10], nikto [11], Skipfish [12] and NTOSpider [13]) WAVS tools. Some of the existing commercial WAVS vendors (e.g. Netsparker and Acunetix) are providing their scanning services as SaaS (Security as a Service) over cloud as well.

WAVS tools follow a similar approach for vulnerability scanning. Each one holds a predefined list of common vulnerabilities and try to find out which vulnerabilities from their predefined list exist in the tested web application.Nevertheless, they need to identify entry points (aka attack points) before they start vulnerability scanning. Both static(e.g. *.html, *.js) and dynamic script (e.g. *.jsp, *.php and *.aspx) files that are located in the main directory of a web application are exploited as attack points. WAVS tools identify attack points, generate GET and POST requests,manipulate all possible parameters, send HTTP requests to the back-end server and finally check the HTTP responses for possible weaknesses and vulnerable parameters. The more attack points a WAVS tool identifies, the more manipulated HTTP requests it can generate and therefore the more vulnerabilities it can detect.

Each WAVS tool has a crawler component that is used to identify as many attack points as possible. The crawler discovers attack points and then the scanning module searches for possible vulnerabilities. The coverage capability and quality of its crawler plays a very critical role in the success of a WAVS tool. If a resource file cannot be discovered as an attack point by a WAVS crawler, vulnerabilities within this resource cannot be checked and as such existing vulnerabilities are missed and false -

negative results are generated.

In general, WAVS tools are compared based on how many vulnerabilities they detect and how many false-positives they generate. However, the number of vulnerabilities they detect depends strongly on the coverage quality of their crawling components. Therefore, it is important to compare crawling

quality of WAVS tools. To the best of our knowledge, there exists so far no tool that can be used for benchmarking of WAVS crawlers. To enable crawler benchmarking, we developed a novel method and implemented WIVET (Web Input Vector Extractor Teaser).

In this paper, we propose a novel method for benchmarking of WAVS crawlers from the perspective of coverage quality. The rest of the paper is organized as follows: Section 2 explains the evolution of crawling mechanisms based on Web 1.0 and 2.0 technologies. The architecture of WIVET is explained in Section 3. Several WIVET test cases and target links are explained in Section 4. Benchma

剩余内容已隐藏，支付完成后下载完整资料

中文翻译：

WIVET—基准覆盖

Web爬虫的机制

1.研究动机

Web应用程序存在严重的漏洞，（例如：SQL资料隐码攻击，跨站脚本攻击，会话管理不足，跨站请求伪造）已经成为互联网上一个关于网络威胁的最关键的问题。根据在2014年安全渗透进行的几项调查结果发现，每十个测试的网站中就有九个存在一个或者多个严重的安全漏洞。这些统计显示，Web应用程序仍然不安全，Web应用程序的安全性必须引起高度重视。所以，进行安全测试是必不可少的，这样可以在Web应用程序的安全漏洞暴露被黑客滥用之前就发现。如今各组织和企业进行安全测试的整合，安全测试进入了它们的发展生命周期。

安全测试存在不同的方式。在黑箱安全测试中，安全测试人员只给了一个网页网址或IP地址，把它当做外部的黑客。然而，安全测试人员在执行白盒安全测试时还可以访问Web应用程序的源代码，证书，架构细节和文档。还存在一种混合型的安全测试称为灰盒测试。在灰箱测试中，安全测试人员有机会访问更多的细节而不仅仅是目标网址或者IP地址，而是如证书，业务流程文档，但不是源代码。

从自动工具使用的角度上，安全测试可以分为静态测试和动态测试。在静态（换句话说手动）黑河测试时，安全测试人员尝试手动检测漏洞而不使用自动化扫描工具。例如，用户手动操作输入参数进行SQL注入或者XSS漏洞的查找。在静态白盒测试期间，也可以称为代码审查，安全测试人员进行关键方法分析，特别是对源代码中输出处理点进行分析。在动态黑河测试中，也被称为漏洞扫描，自动化WAVS工具用于检测漏洞。动态白箱测试，也被称为源代码扫描，用检测工具直接在源代码中进行自动分析和漏洞查找。

每种类型的安全测试对于保护Web应用程序都提供了积极的贡献。自动动态的测试速度非常快，与一个人可以手动实现的功能相比，它可以覆盖应用程序的更多领域。然而，人类智力是静态测试的关键因素，由自动扫描工具检测是不可能测试出一个人才能揭示的严重的安全漏洞（例如业务逻辑缺陷）。黑箱安全测试很容易部署，因此适合缺乏时间来做的项目，而白盒测试需要更长的时间，但是可以揭示更多的关键漏洞。WAVS工具用于动态黑箱测试，因为它们可以轻松部署，非常快速的产生价值的结果，具有并行测试多个应用的能力。存在封闭源（例如IBM AppScan,Netsparkerhe 和Acunetix）和开源(例如，BurpSuite,OWASP ZAP,w3af,nikto Skipfish和NTOSpider)WAVS工具。一些现有的商业WAVS供应商（例如Netsparker和Acunetix）昨晚SaaS提供扫描服务（安全即服务）云。

WAVS工具遵循类似的漏洞方式扫描。每个工具都有一个预定义的场景漏洞列表并尝试从中找出哪些漏洞，预定义列表存在于测试的Web应用程序中。然后，它们需要识别入口点（又称攻击点），然后才开始进行漏洞扫描。两个静态（例如*.html, *.js）动态脚本（例如*.jsp, *.php and*.aspx）位于网络主目录中的文件，被用作应用程序攻击点。WAVS工具识别攻击点，生成GET和POST请求，操纵所有可能的参数，向后端服务器发送HTTP请求，最后检查HTTP相应可能的弱点和脆弱的参数。一个WAVS工具识别越多的攻击点，操作可以生成越多的HTTP请求，因此它可以检测更多的漏洞。

每个WAVS工具都有一个用于爬取网络程序的组件，尽可能多的确定攻击点。爬行器发现攻击点，然后扫描模块搜索可能的漏洞。WAVS工具的覆盖能力和其履带质量在成功中起着非常关键的作用。如果资源文件不能被发现，当WAVS抓取一个攻击点时，这个资源内部的漏洞就不能被检查，因此会有现有的漏洞被错过和假阴性的结果产生。

一般来说，WAVS工具是根据它们检测到的漏洞的多少和假阳性的程度来进行比较的。然而，他们检测到的漏洞数量很大程度上取决于组件爬取的覆盖质量。因此，比较爬虫的WAVS工具的质量是很重要的。据我们所致，到目前为止，还没有可以用于基准测试的WAVS爬虫工具。

在本文中，我们从WAVES爬虫的覆盖质量的角度提出了一种新的基准测试方式。本文的其余部分组织如下：第二节解释了基于Web的爬行机制的演变1.0和2.0技术。在第三节中解释了WIVET的架构。在第四节中解释了几个WIVET测试用例和目标链接。第五节中给出了几个商业和开源WAVS爬虫使用的WIVET基准测试结果。相关工作在论文的第六节和第七节得出结论。此外，附录A解释了乐WIVET集成到WAVS开发环境中的实例，附录B列出了所有实现的WIVET测试用例。

2.通过WEB 1.0和WEB 2.0进行抓取

在Web应用程序中提取链接的技术低于Java Scrip和DOM（文档对象模型）交互性，这代表Web1.0应用程序主要分为两部分：在第一种方法中，使用正则表达式从HTML源代码中提取链接。这个技巧性能较差，像这样的爬虫使用这种方法时，只有在识别匹配时才应用正则表达式。例如，存储在非标准的网页中的链接方式（例如，HTML注释中的链接）中能使用专门的正则表达式提取。在第二种方法中，HTML文件首先被转换到基于XML的DOM树种，然后通过它查询XPath并搜索节点，包括可能的链接。这比第一种方法快，当它比较适合支持DOM转换为非标准HTML的页面。另外，应该遍历并执行现有网站的包括JavaScript事件处理程序的所有节点以提取任何内容。

有关于基本Web 1.0的输入向量的三个实例应用程序链接如表1所示。第一个输入向量可以被描述为使用“a href”的基本链接，可以由用户点击的HTML元素属性，第二个可输入向量可以被描述为“iframe src”，从域加载的HTML元素属性与任何用户的内容没有交互。第三个是原始表单元素及其动作包含提交并跟随目标链接的表单提交。提取来自表1的例子是WAVS抓取工具提取链接，可以直截了当的完成这些困难的任务。

3. WIVET 架构

WIVET的整体架构及WAVS爬行器和用户WIVET用户如图1所示。WIVET用户配置端的WAVS抓取工具用来访问基于Web的WIVET URL并开始爬取过程。该抓取工具发送HTTP请求，分析HTTP的响应，尝试发现WIVET网络中可访问的任何应用链接。WIVET作为基于Web的独立的应用程序，它提供了几个不同的HTML元素链接，包括Web 1.0应用程序的表单操作网址和Java Script生成的链接。Java Script的框架相对复杂，具有逻辑性和基于时间的结构的HTML表单是为了用于Web 2.0应用程序创建上述复杂的链接。等待一定的URL安装后，WIVET可以直接通过WAVS抓取工具进行扫描。预期WAVS爬虫会寻找并访问集成在WIVET测试用例中的所有静态和动态目标链接。在爬行过程结束时，WIVET显示全部成功访问特定WAVS爬虫的链接以及其覆盖质量的评分。如果我们观察内部结构的细节，就会发现存在三个主要组件，即测试用例生成器，目标链接访问监视和历史访问分析器和WIVET架构。图2显示了这些内部组件以及它们之间的相互作用。

测试用例生成器组件扶着提供所有测试用例和目标链接。每个测试用例都包含一个或者多个使用Java Script静态或者动态创建的目标链接。每个测试用例都表示为独立的脚本文件。预期WAVS抓取工具将访问每个测试用例文件，然后提取相关的目标链接。目标链接名称具有lsquo;TestCaseNumber_RandomValue.phprsquo;（例如，.1_25e2a.php）的符号。目标链接分组到不同的测试用例中，TestCaseNumber表示目标链接所属的特定测试用例。RandomValue用于防止抓取工具对目标进行干预，为了简单起见，目标链接用静态值表示（例如2_target1.php）贯穿全文。

如果WAVS爬行器从测试用例中提取目标链接，则会想该特定链路发送HTTP请求。目标链接访问监视组件嵌入到每个目标链接脚本文件中，并由针对目标链接的每个HTTP请求触发。提供这一点功能，该组件可以跟踪处理不弄WIVET测试用例的WAVS爬虫的所有访问链接。该组件还使用持久存储来跟踪XML文件或MySQL数据库中访问的链接。详细信息如访问链接，访问用户代理，访问时间，爬虫的会话ID，命中次数，访问描述源IP地址都被储存进行进一步分析。

历史访问分析器组件允许访问所有爬行分析记录的历史数据。WIVET用户可以访问统计部分，如图1所示，并检查他或她的WAVS履带的历史报道结果。

WIVET提供如图1所示的用户界面，为了让WIVET用户分析其履带质量性能的结果。WIVET生成详细测试结果的统计报告，它显示相关爬虫程序成功访问哪些链接以及哪些链接未能被发现。提供这些，WAVS开发商可以分析结果，找出这些失败的根本原因。这将有助于WAVS开发人员提高其抓取工具的有效性。

WAVS工具开发人员预计将集成WIVET加入他们的开发环境并分析其爬虫的质量。据我们了解，目前为止，几个WAVS工具（如Acunetix，w3af，Netsparker，HP WebInspect和NTOSpider）的开发人员已将WIVET集成到其开发生命周期中，并定期用它来检查和提高其抓取质量。有关详细信息，可参考附录A，解释了将WIVET集成到WAVS开发环境中的所有必要步骤。

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[25661]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

原文和译文剩余内容已隐藏，您需要先支付 30元 才能查看原文和译文全部内容！立即支付

发小红书推广免费获取该资料资格。点击链接进入获取推广文案即可： Ai一键组稿 | 降AI率 | 降重复率 | 论文一键排版

注册

找回密码

WIVET—基准覆盖 Web爬虫的机制外文翻译资料

您可能感兴趣的文章

登录

注册

找回密码

您可能感兴趣的文章