org.carrot2.clustering.stc
Class STCClusteringAlgorithmDescriptor.AttributeBuilder

java.lang.Object
  extended by org.carrot2.clustering.stc.STCClusteringAlgorithmDescriptor.AttributeBuilder
Enclosing class:
STCClusteringAlgorithmDescriptor

public static class STCClusteringAlgorithmDescriptor.AttributeBuilder
extends Object

Attribute map builder for the STCClusteringAlgorithm component. You can use this builder as a type-safe alternative to populating the attribute map using attribute keys.


Field Summary
 Map<String,Object> map
          The attribute map populated by this builder.
 
Constructor Summary
protected STCClusteringAlgorithmDescriptor.AttributeBuilder(Map<String,Object> map)
          Creates a builder backed by the provided map.
 
Method Summary
 List<Cluster> clusters()
          Clusters created by the algorithm.
 STCClusteringAlgorithmDescriptor.AttributeBuilder documentCountBoost(double value)
          Document count boost.
 STCClusteringAlgorithmDescriptor.AttributeBuilder documents(List<Document> value)
          Documents to cluster.
 STCClusteringAlgorithmDescriptor.AttributeBuilder ignoreWordIfInFewerDocs(int value)
          Minimum word-document recurrences.
 STCClusteringAlgorithmDescriptor.AttributeBuilder ignoreWordIfInHigherDocsPercent(double value)
          Maximum word-document ratio.
 STCClusteringAlgorithmDescriptor.AttributeBuilder maxBaseClusters(int value)
          Maximum base clusters count.
 STCClusteringAlgorithmDescriptor.AttributeBuilder maxClusters(int value)
          Maximum final clusters.
 STCClusteringAlgorithmDescriptor.AttributeBuilder maxDescPhraseLength(int value)
          Maximum words per label.
 STCClusteringAlgorithmDescriptor.AttributeBuilder maxPhraseOverlap(double value)
          Maximum cluster phrase overlap.
 STCClusteringAlgorithmDescriptor.AttributeBuilder maxPhrases(int value)
          Maximum phrases per label.
 STCClusteringAlgorithmDescriptor.AttributeBuilder mergeThreshold(double value)
          Base cluster merge threshold.
 STCClusteringAlgorithmDescriptor.AttributeBuilder minBaseClusterScore(double value)
          Minimum base cluster score.
 STCClusteringAlgorithmDescriptor.AttributeBuilder minBaseClusterSize(int value)
          Minimum documents per base cluster.
 STCClusteringAlgorithmDescriptor.AttributeBuilder mostGeneralPhraseCoverage(double value)
          Minimum general phrase coverage.
 MultilingualClusteringDescriptor.AttributeBuilder multilingualClustering()
          Returns an attribute builder for the nested MultilingualClustering component, backed by the same attribute map as the current builder.
 STCClusteringAlgorithmDescriptor.AttributeBuilder optimalPhraseLength(int value)
          Optimal label length.
 STCClusteringAlgorithmDescriptor.AttributeBuilder optimalPhraseLengthDev(double value)
          Phrase length tolerance.
 BasicPreprocessingPipelineDescriptor.AttributeBuilder preprocessingPipeline()
          Returns an attribute builder for the nested BasicPreprocessingPipeline component, backed by the same attribute map as the current builder.
 STCClusteringAlgorithmDescriptor.AttributeBuilder query(String value)
          Query that produced the documents.
 STCClusteringAlgorithmDescriptor.AttributeBuilder singleTermBoost(double value)
          Single term boost.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

map

public final Map<String,Object> map
The attribute map populated by this builder.

Constructor Detail

STCClusteringAlgorithmDescriptor.AttributeBuilder

protected STCClusteringAlgorithmDescriptor.AttributeBuilder(Map<String,Object> map)
Creates a builder backed by the provided map.

Method Detail

query

public STCClusteringAlgorithmDescriptor.AttributeBuilder query(String value)
Query that produced the documents. The query will help the algorithm to create better clusters. Therefore, providing the query is optional but desirable.

See Also:
STCClusteringAlgorithm.query

documents

public STCClusteringAlgorithmDescriptor.AttributeBuilder documents(List<Document> value)
Documents to cluster.

See Also:
STCClusteringAlgorithm.documents

clusters

public List<Cluster> clusters()
Clusters created by the algorithm.

See Also:
STCClusteringAlgorithm.clusters

ignoreWordIfInFewerDocs

public STCClusteringAlgorithmDescriptor.AttributeBuilder ignoreWordIfInFewerDocs(int value)
Minimum word-document recurrences.

See Also:
STCClusteringAlgorithm.ignoreWordIfInFewerDocs

ignoreWordIfInHigherDocsPercent

public STCClusteringAlgorithmDescriptor.AttributeBuilder ignoreWordIfInHigherDocsPercent(double value)
Maximum word-document ratio. A number between 0 and 1, if a word exists in more snippets than this ratio, it is ignored.

See Also:
STCClusteringAlgorithm.ignoreWordIfInHigherDocsPercent

minBaseClusterScore

public STCClusteringAlgorithmDescriptor.AttributeBuilder minBaseClusterScore(double value)
Minimum base cluster score.

See Also:
STCClusteringAlgorithm.minBaseClusterScore

maxBaseClusters

public STCClusteringAlgorithmDescriptor.AttributeBuilder maxBaseClusters(int value)
Maximum base clusters count. Trims the base cluster array after N-th position for the merging phase.

See Also:
STCClusteringAlgorithm.maxBaseClusters

minBaseClusterSize

public STCClusteringAlgorithmDescriptor.AttributeBuilder minBaseClusterSize(int value)
Minimum documents per base cluster.

See Also:
STCClusteringAlgorithm.minBaseClusterSize

maxClusters

public STCClusteringAlgorithmDescriptor.AttributeBuilder maxClusters(int value)
Maximum final clusters.

See Also:
STCClusteringAlgorithm.maxClusters

mergeThreshold

public STCClusteringAlgorithmDescriptor.AttributeBuilder mergeThreshold(double value)
Base cluster merge threshold.

See Also:
STCClusteringAlgorithm.mergeThreshold

maxPhraseOverlap

public STCClusteringAlgorithmDescriptor.AttributeBuilder maxPhraseOverlap(double value)
Maximum cluster phrase overlap.

See Also:
STCClusteringAlgorithm.maxPhraseOverlap

mostGeneralPhraseCoverage

public STCClusteringAlgorithmDescriptor.AttributeBuilder mostGeneralPhraseCoverage(double value)
Minimum general phrase coverage. Minimum phrase coverage to appear in cluster description.

See Also:
STCClusteringAlgorithm.mostGeneralPhraseCoverage

maxDescPhraseLength

public STCClusteringAlgorithmDescriptor.AttributeBuilder maxDescPhraseLength(int value)
Maximum words per label. Base clusters formed by phrases with more words than this ratio are trimmed.

See Also:
STCClusteringAlgorithm.maxDescPhraseLength

maxPhrases

public STCClusteringAlgorithmDescriptor.AttributeBuilder maxPhrases(int value)
Maximum phrases per label. Maximum number of phrases from base clusters promoted to the cluster's label.

See Also:
STCClusteringAlgorithm.maxPhrases

singleTermBoost

public STCClusteringAlgorithmDescriptor.AttributeBuilder singleTermBoost(double value)
Single term boost. A factor in calculation of the base cluster score. If greater then zero, single-term base clusters are assigned this value regardless of the penalty function.

See Also:
STCClusteringAlgorithm.singleTermBoost

optimalPhraseLength

public STCClusteringAlgorithmDescriptor.AttributeBuilder optimalPhraseLength(int value)
Optimal label length. A factor in calculation of the base cluster score.

See Also:
STCClusteringAlgorithm.optimalPhraseLength

optimalPhraseLengthDev

public STCClusteringAlgorithmDescriptor.AttributeBuilder optimalPhraseLengthDev(double value)
Phrase length tolerance. A factor in calculation of the base cluster score.

See Also:
STCClusteringAlgorithm.optimalPhraseLengthDev

documentCountBoost

public STCClusteringAlgorithmDescriptor.AttributeBuilder documentCountBoost(double value)
Document count boost. A factor in calculation of the base cluster score, boosting the score depending on the number of documents found in the base cluster.

See Also:
STCClusteringAlgorithm.documentCountBoost

preprocessingPipeline

public BasicPreprocessingPipelineDescriptor.AttributeBuilder preprocessingPipeline()
Returns an attribute builder for the nested BasicPreprocessingPipeline component, backed by the same attribute map as the current builder.


multilingualClustering

public MultilingualClusteringDescriptor.AttributeBuilder multilingualClustering()
Returns an attribute builder for the nested MultilingualClustering component, backed by the same attribute map as the current builder.



Copyright (c) Dawid Weiss, Stanislaw Osinski