org.carrot2.source.opensearch
Class OpenSearchDocumentSource

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.source.SearchEngineBase
          extended by org.carrot2.source.MultipageSearchEngine
              extended by org.carrot2.source.opensearch.OpenSearchDocumentSource
All Implemented Interfaces:
IDocumentSource, IProcessingComponent

public class OpenSearchDocumentSource
extends MultipageSearchEngine

A IDocumentSource fetching Documents (search results) from an OpenSearch feed.

Based on code donated by Julien Nioche.

See Also:
OpenSearch.org

Nested Class Summary
 
Nested classes/interfaces inherited from class org.carrot2.source.MultipageSearchEngine
MultipageSearchEngine.SearchEngineResponseCallable, MultipageSearchEngine.SearchMode, MultipageSearchEngine.SearchRange
 
Field Summary
 Map<String,String> feedUrlParams
          Additional parameters to be appended to feedUrlTemplate on each request.
 String feedUrlTemplate
          URL to fetch the search feed from.
 int maximumResults
          Maximum number of results.
 int resultsPerPage
          Results per page.
 String userAgent
          User agent header.
 
Fields inherited from class org.carrot2.source.MultipageSearchEngine
searchMode
 
Fields inherited from class org.carrot2.source.SearchEngineBase
compressed, documents, query, results, resultsTotal, start, statistics
 
Constructor Summary
OpenSearchDocumentSource()
           
 
Method Summary
 void beforeProcessing()
          Invoked after the attributes marked with Processing and Input annotations have been bound, but before a call to IProcessingComponent.process().
protected  Callable<SearchEngineResponse> createFetcher(MultipageSearchEngine.SearchRange bucket)
          Subclasses should override this method and return a MultipageSearchEngine.SearchEngineResponseCallable instance that fetches search results in the given range.
 void process()
          Performs the processing required to fulfill the request.
 
Methods inherited from class org.carrot2.source.MultipageSearchEngine
collectDocuments, process, runQuery
 
Methods inherited from class org.carrot2.source.SearchEngineBase
afterFetch, clean, urlEncode
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, dispose, getContext, getSharedExecutor, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, dispose, init
 

Field Detail

feedUrlTemplate

public String feedUrlTemplate
URL to fetch the search feed from. The URL template can contain variable place holders as defined by the OpenSearch specification that will be replaced during runtime. The format of the place holder is ${variable}. The following variables are supported:

Example URL feed templates for public services:

nature.com
http://www.nature.com/opensearch/request?interface=opensearch&operation=searchRetrieve&query=${searchTerms}&startRecord=${startIndex}&maximumRecords=${count}&httpAccept=application/rss%2Bxml
indeed.com
http://www.indeed.com/opensearch?q=${searchTerms}&start=${startIndex}&limit=${count}

Attribute label:
Feed URL template
Attribute level:
Basic
Attribute group:
Service

resultsPerPage

public int resultsPerPage
Results per page. The number of results per page the document source will expect the feed to return.

Attribute label:
Results per page
Attribute level:
Basic
Attribute group:
Service

maximumResults

public int maximumResults
Maximum number of results. The maximum number of results the document source can deliver.

Attribute label:
Maximum results
Attribute level:
Basic
Attribute group:
Service

feedUrlParams

public Map<String,String> feedUrlParams
Additional parameters to be appended to feedUrlTemplate on each request.

Attribute label:
Feed URL parameters
Attribute level:
Advanced
Attribute group:
Service

userAgent

public String userAgent
User agent header. The contents of the User-Agent HTTP header to use when making requests to the feed URL. If empty or null value is provided, the following User-Agent will be sent: Rome Client (http://tinyurl.com/64t5n) Ver: UNKNOWN.

Attribute label:
User agent
Attribute level:
Advanced
Attribute group:
Service
Constructor Detail

OpenSearchDocumentSource

public OpenSearchDocumentSource()
Method Detail

beforeProcessing

public void beforeProcessing()
Description copied from interface: IProcessingComponent
Invoked after the attributes marked with Processing and Input annotations have been bound, but before a call to IProcessingComponent.process(). In this method, the processing component should perform any initializations based on the runtime attributes. This method is called once per request.

Specified by:
beforeProcessing in interface IProcessingComponent
Overrides:
beforeProcessing in class ProcessingComponentBase

process

public void process()
             throws ProcessingException
Description copied from interface: IProcessingComponent
Performs the processing required to fulfill the request. This method is called once per request.

Specified by:
process in interface IProcessingComponent
Overrides:
process in class ProcessingComponentBase
Throws:
ProcessingException - when processing failed. If thrown, the IProcessingComponent.afterProcessing() method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.

createFetcher

protected Callable<SearchEngineResponse> createFetcher(MultipageSearchEngine.SearchRange bucket)
Description copied from class: MultipageSearchEngine
Subclasses should override this method and return a MultipageSearchEngine.SearchEngineResponseCallable instance that fetches search results in the given range.

Note the query (if any is required) should be passed at the concrete class level. We are not concerned with it here.

Specified by:
createFetcher in class MultipageSearchEngine
Parameters:
bucket - The search range to fetch.


Copyright (c) Dawid Weiss, Stanislaw Osinski