org.carrot2.clustering.lingo
Class ClusterBuilder

java.lang.Object
  extended by org.carrot2.clustering.lingo.ClusterBuilder

public class ClusterBuilder
extends Object

Builds cluster labels based on the reduced term-document matrix and assigns documents to the labels.


Field Summary
 double clusterMergingThreshold
          Cluster merging threshold.
 IFeatureScorer featureScorer
          Optional feature scorer.
 ILabelAssigner labelAssigner
          Cluster label assignment method.
 double phraseLabelBoost
          Phrase label boost.
 int phraseLengthPenaltyStart
          Phrase length penalty start.
 int phraseLengthPenaltyStop
          Phrase length penalty stop.
 
Constructor Summary
ClusterBuilder()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

phraseLabelBoost

public double phraseLabelBoost
Phrase label boost. The weight of multi-word labels relative to one-word labels. Low values will result in more one-word labels being produced, higher values will favor multi-word labels.

Attribute label:
Phrase label boost
Attribute level:
Medium
Attribute group:
Labels

phraseLengthPenaltyStart

public int phraseLengthPenaltyStart
Phrase length penalty start. The phrase length at which the overlong multi-word labels should start to be penalized. Phrases of length smaller than phraseLengthPenaltyStart will not be penalized.

Attribute label:
Phrase length penalty start
Attribute level:
Advanced
Attribute group:
Labels

phraseLengthPenaltyStop

public int phraseLengthPenaltyStop
Phrase length penalty stop. The phrase length at which the overlong multi-word labels should be removed completely. Phrases of length larger than phraseLengthPenaltyStop will be removed.

Attribute label:
Phrase length penalty stop
Attribute level:
Advanced
Attribute group:
Labels

clusterMergingThreshold

public double clusterMergingThreshold
Cluster merging threshold. The percentage overlap between two cluster's documents required for the clusters to be merged into one clusters. Low values will result in more aggressive merging, which may lead to irrelevant documents in clusters. High values will result in fewer clusters being merged, which may lead to very similar or duplicated clusters.

Attribute label:
Cluster merging threshold
Attribute level:
Medium
Attribute group:
Labels

featureScorer

public IFeatureScorer featureScorer
Optional feature scorer. We don't make it an attribute for now as the core Lingo will not have any implementations for this interface.


labelAssigner

public ILabelAssigner labelAssigner
Cluster label assignment method.

Attribute label:
Cluster label assignment method
Attribute level:
Advanced
Attribute group:
Labels
Constructor Detail

ClusterBuilder

public ClusterBuilder()


Copyright (c) Dawid Weiss, Stanislaw Osinski