org.carrot2.source
Class MultipageSearchEngine

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.source.SearchEngineBase
          extended by org.carrot2.source.MultipageSearchEngine
All Implemented Interfaces:
IDocumentSource, IProcessingComponent
Direct Known Subclasses:
Bing2DocumentSource, GoogleDocumentSource, IdolDocumentSource, OpenSearchDocumentSource

public abstract class MultipageSearchEngine
extends SearchEngineBase

A base class facilitating implementation of IDocumentSources wrapping external search engines with remote/ network-based interfaces. This class implements helper methods for concurrent querying of search services that limit the number of search results returned in one request.

See Also:
SimpleSearchEngine

Nested Class Summary
protected  class MultipageSearchEngine.SearchEngineResponseCallable
          An implementation of Callable that increments page request count statistics before the actual search is made.
static class MultipageSearchEngine.SearchMode
          Search mode for data source components that implement parallel request to some search service.
protected static class MultipageSearchEngine.SearchRange
          A single result window to fetch.
 
Field Summary
 MultipageSearchEngine.SearchMode searchMode
          Search mode defines how fetchers returned from createFetcher(org.carrot2.source.MultipageSearchEngine.SearchRange) are called.
 
Fields inherited from class org.carrot2.source.SearchEngineBase
compressed, documents, query, results, resultsTotal, start, statistics
 
Constructor Summary
MultipageSearchEngine()
           
 
Method Summary
protected  void collectDocuments(Collection<Document> collector, SearchEngineResponse[] responses)
          Collects documents from an array of search engine's responses.
protected abstract  Callable<SearchEngineResponse> createFetcher(MultipageSearchEngine.SearchRange bucket)
          Subclasses should override this method and return a MultipageSearchEngine.SearchEngineResponseCallable instance that fetches search results in the given range.
protected  void process(MultipageSearchEngineMetadata metadata, ExecutorService executor)
          Run a request the search engine's API, setting documents to the set of returned documents.
protected  SearchEngineResponse[] runQuery(String query, int start, int results, MultipageSearchEngineMetadata metadata, ExecutorService executor)
          This method implements the logic of querying a typical search engine.
 
Methods inherited from class org.carrot2.source.SearchEngineBase
afterFetch, clean, urlEncode
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init, process
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose, init, process
 

Field Detail

searchMode

public MultipageSearchEngine.SearchMode searchMode
Search mode defines how fetchers returned from createFetcher(org.carrot2.source.MultipageSearchEngine.SearchRange) are called.

See Also:
MultipageSearchEngine.SearchMode
Attribute label:
Search Mode
Attribute level:
Advanced
Attribute group:
Results paging
Constructor Detail

MultipageSearchEngine

public MultipageSearchEngine()
Method Detail

process

protected void process(MultipageSearchEngineMetadata metadata,
                       ExecutorService executor)
                throws ProcessingException
Run a request the search engine's API, setting documents to the set of returned documents.

Throws:
ProcessingException

createFetcher

protected abstract Callable<SearchEngineResponse> createFetcher(MultipageSearchEngine.SearchRange bucket)
Subclasses should override this method and return a MultipageSearchEngine.SearchEngineResponseCallable instance that fetches search results in the given range.

Note the query (if any is required) should be passed at the concrete class level. We are not concerned with it here.

Parameters:
bucket - The search range to fetch.

collectDocuments

protected final void collectDocuments(Collection<Document> collector,
                                      SearchEngineResponse[] responses)
Collects documents from an array of search engine's responses.


runQuery

protected final SearchEngineResponse[] runQuery(String query,
                                                int start,
                                                int results,
                                                MultipageSearchEngineMetadata metadata,
                                                ExecutorService executor)
                                         throws ProcessingException
This method implements the logic of querying a typical search engine. If the number of requested results is higher than the number of results on one response page, then multiple (possibly concurrent) requests are issued via the provided ExecutorService.

Throws:
ProcessingException


Copyright (c) Dawid Weiss, Stanislaw Osinski