org.carrot2.text.preprocessing.filter
Class CompleteLabelFilter

java.lang.Object
  extended by org.carrot2.text.preprocessing.filter.CompleteLabelFilter
All Implemented Interfaces:
ILabelFilter

public class CompleteLabelFilter
extends Object
implements ILabelFilter

A filter that removes "incomplete" labels.

See this document, page 31 for a definition of a complete phrase.


Field Summary
 boolean enabled
          Remove truncated phrases.
 double labelOverrideThreshold
          Truncated label threshold.
 
Constructor Summary
CompleteLabelFilter()
           
 
Method Summary
 void filter(PreprocessingContext context, boolean[] acceptedStems, boolean[] acceptedPhrases)
          Marks incomplete labels.
 boolean isEnabled()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

enabled

public boolean enabled
Remove truncated phrases. Tries to remove "incomplete" cluster labels. For example, in a collection of documents related to Data Mining, the phrase Conference on Data is incomplete in a sense that most likely it should be Conference on Data Mining or even Conference on Data Mining in Large Databases. When truncated phrase removal is enabled, the algorithm would try to remove the "incomplete" phrases like the former one and leave only the more informative variants.

Attribute label:
Remove truncated phrases
Attribute level:
Basic
Attribute group:
Label filtering

labelOverrideThreshold

public double labelOverrideThreshold
Truncated label threshold. Determines the strength of the truncated label filter. The lowest value means strongest truncated labels elimination, which may lead to overlong cluster labels and many unclustered documents. The highest value effectively disables the filter, which may result in short or truncated labels.

Attribute label:
Truncated label threshold
Attribute level:
Advanced
Attribute group:
Phrase extraction
Constructor Detail

CompleteLabelFilter

public CompleteLabelFilter()
Method Detail

filter

public void filter(PreprocessingContext context,
                   boolean[] acceptedStems,
                   boolean[] acceptedPhrases)
Marks incomplete labels.

Specified by:
filter in interface ILabelFilter
Parameters:
context - contains words and phrases to be filtered
acceptedStems - the filter should set to false those elements that correspond to the stems to be filtered out
acceptedPhrases - the filter should set to false those elements that correspond to the phrases to be filtered out

isEnabled

public boolean isEnabled()
Specified by:
isEnabled in interface ILabelFilter
Returns:
true if the filter is to be applied, false if the filter should be omitted by the LabelFilterProcessor.


Copyright (c) Dawid Weiss, Stanislaw Osinski