org.carrot2.source
Class SearchEngineBase

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.source.SearchEngineBase
All Implemented Interfaces:
IDocumentSource, IProcessingComponent
Direct Known Subclasses:
MultipageSearchEngine, SimpleSearchEngine

public abstract class SearchEngineBase
extends ProcessingComponentBase
implements IDocumentSource

A base class facilitating implementation of IDocumentSources wrapping external search engines with remote/ network-based interfaces. The base class defines the common attribute fields used by more specific base classes and concrete implementations.

See Also:
SimpleSearchEngine, MultipageSearchEngine

Field Summary
 boolean compressed
          Indicates whether the search engine returned a compressed result stream.
 Collection<Document> documents
           
 String query
           
 int results
           
 long resultsTotal
           
 int start
           
 SearchEngineStats statistics
          This component usage statistics.
 
Constructor Summary
SearchEngineBase()
           
 
Method Summary
protected  void afterFetch(SearchEngineResponse response)
          Called after a single search engine response has been fetched.
protected static void clean(SearchEngineResponse response, boolean keepHighlights, String... fields)
          Unescape HTML entities and tags from a given set of fields of all documents in the provided response.
protected static String urlEncode(String string)
          URL-encodes a string into UTF-8.
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init, process
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose, init, process
 

Field Detail

start

public int start

results

public int results

query

public String query

resultsTotal

public long resultsTotal

documents

public Collection<Document> documents

compressed

public boolean compressed
Indicates whether the search engine returned a compressed result stream.

Attribute label:
Compression used
Attribute group:
Data source status

statistics

public SearchEngineStats statistics
This component usage statistics.

Constructor Detail

SearchEngineBase

public SearchEngineBase()
Method Detail

clean

protected static void clean(SearchEngineResponse response,
                            boolean keepHighlights,
                            String... fields)
Unescape HTML entities and tags from a given set of fields of all documents in the provided response.

Parameters:
response - the search engine response to clean
keepHighlights - set to true to keep query terms highlights
fields - names of fields to clean

afterFetch

protected void afterFetch(SearchEngineResponse response)
Called after a single search engine response has been fetched. The concrete implementation may want to override this empty implementation to e.g., clean or otherwise postprocess the returned results.


urlEncode

protected static final String urlEncode(String string)
URL-encodes a string into UTF-8.



Copyright (c) Dawid Weiss, Stanislaw Osinski