org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.clustering.lingo.LingoClusteringAlgorithm
All Implemented Interfaces:
IClusteringAlgorithm, IProcessingComponent

public class LingoClusteringAlgorithm
extends ProcessingComponentBase
implements IClusteringAlgorithm

Lingo clustering algorithm. Implementation as described in: "Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48—54.".


Field Summary
 ClusterBuilder clusterBuilder
          Cluster label builder, contains bindable attributes.
 List<Cluster> clusters
           
 int desiredClusterCountBase
          Desired cluster count base.
 List<Document> documents
          Documents to cluster.
 LabelFormatter labelFormatter
          Cluster label formatter, contains bindable attributes.
 TermDocumentMatrixBuilder matrixBuilder
          Term-document matrix builder for the algorithm, contains bindable attributes.
 TermDocumentMatrixReducer matrixReducer
          Term-document matrix reducer for the algorithm, contains bindable attributes.
 MultilingualClustering multilingualClustering
          A helper for performing multilingual clustering.
 CompletePreprocessingPipeline preprocessingPipeline
          Common preprocessing tasks handler, contains bindable attributes.
 String query
          Query that produced the documents.
 double scoreWeight
          Balance between cluster score and size during cluster sorting.
 
Constructor Summary
LingoClusteringAlgorithm()
           
 
Method Summary
 void process()
          Performs Lingo clustering of documents.
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose, init
 

Field Detail

query

public String query
Query that produced the documents. The query will help the algorithm to create better clusters. Therefore, providing the query is optional but desirable.


documents

public List<Document> documents
Documents to cluster.


clusters

public List<Cluster> clusters

scoreWeight

public double scoreWeight
Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.

Attribute label:
Size-Score sorting ratio
Attribute level:
Medium
Attribute group:
Clusters

desiredClusterCountBase

public int desiredClusterCountBase
Desired cluster count base. Base factor used to calculate the number of clusters based on the number of documents on input. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportional to the cluster count base, but not in a linear way.

Attribute label:
Cluster count base
Attribute level:
Basic
Attribute group:
Clusters

preprocessingPipeline

public final CompletePreprocessingPipeline preprocessingPipeline
Common preprocessing tasks handler, contains bindable attributes.


matrixBuilder

public final TermDocumentMatrixBuilder matrixBuilder
Term-document matrix builder for the algorithm, contains bindable attributes.


matrixReducer

public final TermDocumentMatrixReducer matrixReducer
Term-document matrix reducer for the algorithm, contains bindable attributes.


clusterBuilder

public final ClusterBuilder clusterBuilder
Cluster label builder, contains bindable attributes.


labelFormatter

public final LabelFormatter labelFormatter
Cluster label formatter, contains bindable attributes.


multilingualClustering

public final MultilingualClustering multilingualClustering
A helper for performing multilingual clustering.

Constructor Detail

LingoClusteringAlgorithm

public LingoClusteringAlgorithm()
Method Detail

process

public void process()
             throws ProcessingException
Performs Lingo clustering of documents.

Specified by:
process in interface IProcessingComponent
Overrides:
process in class ProcessingComponentBase
Throws:
ProcessingException - when processing failed. If thrown, the IProcessingComponent.afterProcessing() method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.


Copyright (c) Dawid Weiss, Stanislaw Osinski