Web agents are software robots that can locate and extract targeted data buried deep within a Web site (e.g., behind form-based interfaces) that are not accessible via traditional keyword-based search engines. They navigate to relevant sites, locate the correct web pages (by traversing appropriate links or filling out HTML forms), extract and organize data of interest from these pages into a database or XML files.
XRover® Web agents are XSB's primary technology for data acquisition. XRover® technology can be used to deliver precise and timely information such as price updates on regular basis or to enrich poor legacy data with content from vendor or manufacturer Web sites.
There are two types of agents: harvest agents and query agents. Harvest agents are used to collect information about all products from a Web site or a section of a Web site, whereas query agents use a user-provided list of search criteria (such as product IDs) to harvest desired information specifically for those items.
The XRover® Web Agent Technology:
The Xrover® Web Agent Technology simplifies the process of creating and building automated agent maps by providing the user with an interactive interface to streamline this process.
XRover® Key Features:
- Example driven pattern discovery
- Data extractions are performed using regular expressions
- Editors allow for the modification of regular expressions
- Contains an internal HTML web browser:
- Supports both HTTP and HTTPS protocols
- Supports use of proxy servers
- Syntax highlighting for the HTML source
- Agents are capable of working with Realm Authentication Web sites
- Builds deep web agents that are capable of filling out forms with different supplied values
- Form Capturing Wizard allows for the integration of form navigation into the agent
- Table extraction enables capturing multiple rows of information without requiring advanced knowledge of the total rows
- Link extraction enables the user to extract and add navigation information about a link to the agent
- Ability to structure extracted results
- Ability to change the look and feel
- Ability to supply form arguments, such as a username and password at agent run-time
- Online help feature for building regular expressions
- Agent Validation Tool allows user to validate accuracy of the agent map
- Java Web Start enabled execution
- Approximate agent build time, depending on complexity ~1 hour
The XRover® Agent Manager:
The Agent Manager is a powerful desktop application that allows users to manage and execute Web agent tasks. Users have the ability to add, delete, and schedule agents to run at specified times with a given regularity. This tool also enables users to specify the input and output of the agent tasks and have them presented and stored in a structured and coherent fashion on the user’s desktop for future use.
Agent Manager Key Features:
- Executes agents built using the XRover® technology
- Capable of executing agents utilizing XPath algorithms and/or regular expressions
- Reads input data from an ODBC database or an Excel file
- Supports masking of input that could contain sensitive information such as passwords
- Provides support for scheduling agent tasks:
- Execution of an agent can be specified to run immediately or at a later time
- Recurring execution can be scheduled to be performed in intervals of weeks, days, hours, or minutes
- Scheduled agent information is stored and retained, even if the application is closed and restarted
- Performance:
- Multithreaded
- Ability to follow multiple URLs concurrently for a single agent execution
- Supports:
- use of proxy servers
- HTTP and HTTPS protocols
- Realm Authentication
- Cookies
- Redirection Handling
- Output formatting of the results:
- Stripping out HTML tags
- Removal of white space
- Translating of character entities (e.g. " -> ")
- Online help feature:
- Writes results to ODBC Database or an XML file
- Java Web Start enabled execution
For a solutions overview of the XRover® Web Agent Technology, click here.