public Create a search appliance crawler

Contents [ show ]

Create a web crawler

create crawler

To create a new web crawler, please follow the steps below:

  1. Log in to the WordFrame Integra Core Administration
  2. Click on the "Builder" tab in the upper left corner
  3. Click on the "Content components" menu in the main navigation bar
  4. Click on the “Crawler” link in “Search appliance” section on the left of the screen
  5. Click on the “create new web crawler” link in the upper right corner of the page
Step 1
  1. Enter the crawler name
  2. Type in the “Start crawl URLs”. It could be the URL of your site home page. It should start with http://
  3. Enter the crawler’s general settings
  4. Choose type of authentication if you want the crawler to access secure content. You have the following types:
    • forms – it is the most common one used usually implemented by a login page with username(email) and password fields box.
    • credentials – this authentication provides credentials for password-based authentication schemes such as basic, digest, NTLM and Kerberos authentication.
    • mixed – it is a combination from forms and credentials based authentications.
    • cookie – this authentication is mostly used on sites where you need to agree that you are above a certain age, in order to review content from the site. It is performed by redirecting the user to a certain URL, when the correct answer is clicked.
  5. Click on the “Next” button
Step 2
  1. Depending on the chosen authentication type, set the needed authorization details
  2. Click on the “Next” button
Step 3
  1. Set the crawler execution schedule plan details and pres the “Next” button.
Step 4
  1. Subscribe search indexed by using the “subscribe another index” link in the upper right corner of the grid
  2. Click on the “Create Crawler” button

Create a file system crawler

The creation of the file system crawler is quite similar to the web one. The difference is in provider a file system path, not an url, as a crawler starting point.

Create a XML crawler

create crawler

The XML sitemap crawler will discover only document that are provided by the sitemap files submitted as start locations. This is why in its initial creation step you need to point out the sitemap locations URLs and you do not have rules for link following. Only files from the sitemap will be crawled and any link in them will not be followed.

Last edited by Boz Zashev on 26 Oct 2010 | Rev. 2 | This page is public | Views: 1
Comments: 0 | Filed under: Content components | Tags: