org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner

java.lang.Object
  extended by org.carrot2.text.preprocessing.PreprocessedDocumentScanner

public class PreprocessedDocumentScanner
extends Object

Iterates over tokenized documents in PreprocessingContext.


Field Summary
static com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATOR
          Predicate for splitting on document separator.
static com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATOR
          Predicate for splitting on field separator.
static com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATOR
          Predicate for splitting on sentence separator.
 
Constructor Summary
PreprocessedDocumentScanner()
           
 
Method Summary
protected  void document(PreprocessingContext context, int start, int length)
          Invoked for each document.
static com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)
          Return a new ShortPredicate returning true if the argument equals a given value.
protected  void field(PreprocessingContext context, int start, int length)
          Invoked for each document's field.
 void iterate(PreprocessingContext context)
          Iterate over all documents, fields and sentences in PreprocessingContext.allTokens.
protected  void sentence(PreprocessingContext context, int start, int length)
          Invoked for each document's sentence.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ON_DOCUMENT_SEPARATOR

public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATOR
Predicate for splitting on document separator.


ON_FIELD_SEPARATOR

public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATOR
Predicate for splitting on field separator.


ON_SENTENCE_SEPARATOR

public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATOR
Predicate for splitting on sentence separator.

Constructor Detail

PreprocessedDocumentScanner

public PreprocessedDocumentScanner()
Method Detail

equalTo

public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)
Return a new ShortPredicate returning true if the argument equals a given value.


iterate

public final void iterate(PreprocessingContext context)
Iterate over all documents, fields and sentences in PreprocessingContext.allTokens.


document

protected void document(PreprocessingContext context,
                        int start,
                        int length)
Invoked for each document. Splits further into fields.


field

protected void field(PreprocessingContext context,
                     int start,
                     int length)
Invoked for each document's field. Splits further into sentences.


sentence

protected void sentence(PreprocessingContext context,
                        int start,
                        int length)
Invoked for each document's sentence.



Copyright (c) Dawid Weiss, Stanislaw Osinski