public Browse the existing search appliance crawlers

Contents [ show ]

The storage and format of the content that needs to be crawled and index can vary. This is why there are three different types of crawlers in Integra, described in details below.

Web crawlers

web crawlers list

The Web crawlers discover and feed in web content to the search indexes, in order it to be searchable. One crawler can feed content in multiple search indexes. Crawlers will not crawl web content outside of their start URL domains. To review the existing crawlers, follow the steps below:

  1. Log in to the WordFrame Integra Core Administration
  2. Click on the "Builder" tab in the upper left corner
  3. Click on the "Content components" menu in the main navigation bar
  4. Click on the “Crawler” link in “Search appliance” section on the left of the screen

File system crawlers

file system crawlers

The File system crawlers discover and feed in content discovered in file system documents. You need to have installed dedicated iFilters on the webserver in order to index the files' content. To view the File system crawlers, you need to:

  1. Log in to the WordFrame Integra Core Administration
  2. Click on the "Builder" tab in the upper left corner
  3. Click on the "Content components" menu in the main navigation bar
  4. Click on the “Crawler” link in “Search appliance” section on the left of the screen
  5. Click on the “File system crawlers” tab

Sitemap crawlers

xml crawlers

The sitemap crawler indexes content from formated XML files. Files must be in the format specified by the sitemap.org site. To view the sitemap crawlers, you need to:

  1. Log in to the WordFrame Integra Core Administration
  2. Click on the "Builder" tab in the upper left corner
  3. Click on the "Content components" menu in the main navigation bar
  4. Click on the “Crawler” link in “Search appliance” section on the left of the screen
  5. Click on the “Sitemap crawlers” tab
Last edited by Boz Zashev on 26 Oct 2010 | Rev. 2 | This page is public | Views: 1
Comments: 0 | Filed under: Content components | Tags: