Monday, July 27, 2009

Search Connectors

Microsoft Enterprise Search Products / Solutions

  • Search Server 2008 Express
  • Search Server 2008
  • Office SharePoint Server 2007 (MOSS)
  • FAST ESP

Search Connectors (more info)

  • Indexed Search Connectors – allows  to index the contents of a target system. This is done by creating an internal representation of the content that can be used to return search results quickly. This is the most powerful connector to a target system. There are two main categories of Indexed Search Connectors.
    • Microsoft Indexing Connectors - There are many available for free for Download from Microsoft, and additional from MS Partners.
      • Protocol Handlers (more info) - for unstructured data sources.  Opens content sources in their native protocols and exploses documents and other item to be filtered. Implemented as COM objecs that implement the ISearchProtocol interface. The crawl process works much like the Google Mini. It requires a starting url, and crawls the pages using links on pages. The url determines what kind of protocol handler will be used to crawl a url.
      • Business Data Catalog Application Definition Files – for structured data sources
    • FAST Indexing Connectors – for both structured and unstructured data sources. They are available for purchase for any FAST ESP system.
  • Federated Search Connectors (more info) – The actual search is “out-sourced” to another search engine. Content is not crawled in this approach. Instead, you are enabled to display search results for additional content that is not crawled by your search server. With federation, the query can be performed over the local content index, or it can be forwarded to an external content repository where it is processed by that repository's search engine. The repository's search engine then returns the results to the search server. The search server formats and renders the results from the external repository within the same search results page as the results from the search server's own content index. If search engine does not adhere to standard interfaces, a custom interface may need to be developed that does adhere to a standard interface. Triggers can be used to determine when a search returns federated results. For example, keywords, always, and pattern matching. Anonymous, Common, and Per-User credentials are supported. These credentials can be passed by most any common protocol including Basic, Digest, NTLM, Forms, Cookies, and Kerberos. There is an API to create custom federated search Web Parts. This applies for MOSS as well. Lots of docs via SharePoint docs.
  • iFilters – allow indexing of a wide variety of documents and file types using a common interface across Search Server 2008, Office SharePoint Server 2007, Windows SharePoint Servies 3, Windows Desktop Search, Windows Vista, and SQL Server. Opens documents and other content source items in their native formats and filters these into chunks of text and properties. It can be part of the protocol handler component or it can be a separate component.

For a more in-depth review of whether to use a Federated or Crawled approach, click here.

No comments: