Monday, May 4, 2009

FAST search engine for SharePoint PART 3 (Content Sources and Connectors)

Each “content source” is represented as a “Collection” within FAST ESP. Data is being fed into Document Processing Pipelines for refinement through the use of “Connectors” that are defined for a specific collection.

There are three types of connectors for FAST search engine:

  1. FAST OOTB connectors
  2. Third Party (proprietary)  connectors
  3. Custom connectors, using FAST API

YES! FAST allows you to go against APIs

FAST OOTB connectors

Enterprise Crawler: used to feed content from Web Pages. Content sources, and many other settings for this connector are easily configurable through the Admin UI, including: Content Request rate, Start URIs, Include and exclude host name filters, content crawl interval, etc. Enterprise Crawler allows you to crawl unlimited number of start URIs, detects deleted content, and removes it from index, and retrieves both: static and dynamic content.

ESP File Traverser: traverses and submits files from file system to content pipelines in batches via Content API. Files that this connector serves can be in any binary or text format, as long as this format can be handled by processing pipeline (PDFs, TXT, XML, DOC, and many more).

JDBC Connector or FAST Smart Connector for JDBC: This connector uses database data or structured data for feeding into pipelines (Oracle, SQL, MySQL, DB2, etc). This connector uses JDBC driver that must be registered on the server prior to establishing a connection to the database. It extracts data to be indexed on a column level through the use of SQL query that you supply, this connector is managed through command line and through Web Interface as well.

Connectors mentioned above support content modification detection through the use of checksums as well as timestamps that are kept either in FAST built-in db or some other MySQL db.

Third Party Connectors 

There is not much I can say about these connectors except to list some of them: Lotus Notes, WebSphere, Exchange, Documentum, Hummingbird, and of course SHAREPOINT. So even if you are not looking to upgrade to the SharePoint 2010 when it becomes available with FAST, you still can integrate with SharePoint without reinventing the wheel :-)

Enjoy :-)

No comments: