org.carrot2.text.preprocessing
Class PreprocessingContext

java.lang.Object
  extended by org.carrot2.text.preprocessing.PreprocessingContext

public final class PreprocessingContext
extends Object

Document preprocessing context provides low-level (usually integer-coded) data structures useful for further processing.

Internals of PreprocessingContext


Nested Class Summary
static class PreprocessingContext.AllFields
          Information about all fields processed for the input documents.
 class PreprocessingContext.AllLabels
          Information about words and phrases that might be good cluster label candidates.
 class PreprocessingContext.AllPhrases
          Information about all frequently appearing sequences of words found in the input documents.
 class PreprocessingContext.AllStems
          Information about all unique stems found in the input documents.
 class PreprocessingContext.AllTokens
          Information about all tokens of the input documents.
 class PreprocessingContext.AllWords
          Information about all unique words found in the input documents.
 
Field Summary
 PreprocessingContext.AllFields allFields
          Information about all fields processed for the input documents.
 PreprocessingContext.AllLabels allLabels
          Information about words and phrases that might be good cluster label candidates.
 PreprocessingContext.AllPhrases allPhrases
          Information about all frequently appearing sequences of words found in the input documents.
 PreprocessingContext.AllStems allStems
          Information about all unique stems found in the input documents.
 PreprocessingContext.AllTokens allTokens
          Information about all tokens of the input documents.
 PreprocessingContext.AllWords allWords
          Information about all unique words found in the input documents.
 List<Document> documents
          A list of documents to process.
 LanguageModel language
          Language model to be used
 String query
          Query used to perform processing, may be null
 
Constructor Summary
PreprocessingContext(LanguageModel languageModel, List<Document> documents, String query)
          Creates a preprocessing context for the provided documents and with the provided languageModel.
 
Method Summary
 boolean hasLabels()
          Returns true if this context contains any label candidates.
 boolean hasWords()
          Returns true if this context contains any words.
 char[] intern(MutableCharArray chs)
          Return a unique char buffer representing a given character sequence.
 void preprocessingFinished()
          This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.
static int[] toFieldIndexes(byte b)
          Convert the selected bits in a byte to an array of indexes.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

query

public final String query
Query used to perform processing, may be null


documents

public final List<Document> documents
A list of documents to process.


language

public final LanguageModel language
Language model to be used


allTokens

public final PreprocessingContext.AllTokens allTokens
Information about all tokens of the input documents.


allFields

public final PreprocessingContext.AllFields allFields
Information about all fields processed for the input documents.


allWords

public final PreprocessingContext.AllWords allWords
Information about all unique words found in the input documents.


allStems

public final PreprocessingContext.AllStems allStems
Information about all unique stems found in the input documents.


allPhrases

public PreprocessingContext.AllPhrases allPhrases
Information about all frequently appearing sequences of words found in the input documents.


allLabels

public final PreprocessingContext.AllLabels allLabels
Information about words and phrases that might be good cluster label candidates.

Constructor Detail

PreprocessingContext

public PreprocessingContext(LanguageModel languageModel,
                            List<Document> documents,
                            String query)
Creates a preprocessing context for the provided documents and with the provided languageModel.

Method Detail

hasWords

public boolean hasWords()
Returns true if this context contains any words.


hasLabels

public boolean hasLabels()
Returns true if this context contains any label candidates.


toString

public String toString()
Overrides:
toString in class Object

toFieldIndexes

public static int[] toFieldIndexes(byte b)
Convert the selected bits in a byte to an array of indexes.


preprocessingFinished

public void preprocessingFinished()
This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.


intern

public char[] intern(MutableCharArray chs)
Return a unique char buffer representing a given character sequence.



Copyright (c) Dawid Weiss, Stanislaw Osinski