org.carrot2.text.preprocessing
Class DocumentAssigner

java.lang.Object
  extended by org.carrot2.text.preprocessing.DocumentAssigner

public class DocumentAssigner
extends Object

Assigns document to label candidates. For each label candidate from PreprocessingContext.AllLabels.featureIndex an BitSet with the assigned documents is constructed. The assignment algorithm is rather simple: in order to be assigned to a label, a document must contain at least one occurrence of each non-stop word from the label.

This class saves the following results to the PreprocessingContext :

This class requires that Tokenizer, CaseNormalizer, StopListMarker, PhraseExtractor and LabelFilterProcessor be invoked first.


Field Summary
 boolean exactPhraseAssignment
          Only exact phrase assignments.
 int minClusterSize
          Determines the minimum number of documents in each cluster.
 
Constructor Summary
DocumentAssigner()
           
 
Method Summary
 void assign(PreprocessingContext context)
          Assigns document to label candidates.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

exactPhraseAssignment

public boolean exactPhraseAssignment
Only exact phrase assignments. Assign only documents that contain the label in its original form, including the order of words. Enabling this option will cause less documents to be put in clusters, which result in higher precision of assignment, but also a larger "Other Topics" group. Disabling this option will cause more documents to be put in clusters, which will make the "Other Topics" cluster smaller, but also lower the precision of cluster-document assignments.

Attribute label:
Exact phrase assignment
Attribute level:
Medium
Attribute group:
Preprocessing

minClusterSize

public int minClusterSize
Determines the minimum number of documents in each cluster.

Attribute label:
Minimum cluster size
Attribute level:
Medium
Attribute group:
Preprocessing
Constructor Detail

DocumentAssigner

public DocumentAssigner()
Method Detail

assign

public void assign(PreprocessingContext context)
Assigns document to label candidates.



Copyright (c) Dawid Weiss, Stanislaw Osinski