org.carrot2.text.preprocessing.pipeline
Class CompletePreprocessingPipeline

java.lang.Object
  extended by org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline
      extended by org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline
All Implemented Interfaces:
IPreprocessingPipeline

public class CompletePreprocessingPipeline
extends BasicPreprocessingPipeline

Performs a complete preprocessing on the provided documents. The preprocessing consists of the following steps:

  1. Tokenizer.tokenize(PreprocessingContext)
  2. CaseNormalizer.normalize(PreprocessingContext)
  3. LanguageModelStemmer.stem(PreprocessingContext)
  4. StopListMarker.mark(PreprocessingContext)
  5. PhraseExtractor.extractPhrases(PreprocessingContext)
  6. LabelFilterProcessor.process(PreprocessingContext)
  7. DocumentAssigner.assign(PreprocessingContext)


Field Summary
 DocumentAssigner documentAssigner
          Document assigner used by the algorithm, contains bindable attributes.
 LabelFilterProcessor labelFilterProcessor
          Label filter processor used by the algorithm, contains bindable attributes.
 PhraseExtractor phraseExtractor
          Phrase extractor used by the algorithm, contains bindable attributes.
 
Fields inherited from class org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline
caseNormalizer, languageModelStemmer, lexicalDataFactory, stemmerFactory, stopListMarker, tokenizer, tokenizerFactory
 
Constructor Summary
CompletePreprocessingPipeline()
           
 
Method Summary
 void preprocess(PreprocessingContext context)
          Performs preprocessing on the provided PreprocessingContext.
 
Methods inherited from class org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline
preprocess
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

phraseExtractor

public final PhraseExtractor phraseExtractor
Phrase extractor used by the algorithm, contains bindable attributes.


labelFilterProcessor

public final LabelFilterProcessor labelFilterProcessor
Label filter processor used by the algorithm, contains bindable attributes.


documentAssigner

public final DocumentAssigner documentAssigner
Document assigner used by the algorithm, contains bindable attributes.

Constructor Detail

CompletePreprocessingPipeline

public CompletePreprocessingPipeline()
Method Detail

preprocess

public void preprocess(PreprocessingContext context)
Performs preprocessing on the provided PreprocessingContext.

Specified by:
preprocess in interface IPreprocessingPipeline
Overrides:
preprocess in class BasicPreprocessingPipeline


Copyright (c) Dawid Weiss, Stanislaw Osinski