org.carrot2.clustering.kmeans
Class BisectingKMeansClusteringAlgorithm

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
All Implemented Interfaces:
IClusteringAlgorithm, IProcessingComponent

public class BisectingKMeansClusteringAlgorithm
extends ProcessingComponentBase
implements IClusteringAlgorithm

A very simple implementation of bisecting k-means clustering. Unlike other algorithms in Carrot2, this one creates hard clusterings (one document belongs only to one cluster). On the other hand, the clusters are labeled only with individual words that may not always fully correspond to all documents in the cluster.


Field Summary
 int clusterCount
          The number of clusters to create.
 List<Cluster> clusters
           
 List<Document> documents
           
 int labelCount
          Label count.
 LabelFormatter labelFormatter
          Cluster label formatter, contains bindable attributes.
 TermDocumentMatrixBuilder matrixBuilder
          Term-document matrix builder for the algorithm, contains bindable attributes.
 TermDocumentMatrixReducer matrixReducer
          Term-document matrix reducer for the algorithm, contains bindable attributes.
 int maxIterations
          The maximum number of k-means iterations to perform.
 int partitionCount
          Partition count.
 BasicPreprocessingPipeline preprocessingPipeline
          Common preprocessing tasks handler, contains bindable attributes.
 boolean useDimensionalityReduction
          Use dimensionality reduction.
 
Constructor Summary
BisectingKMeansClusteringAlgorithm()
           
 
Method Summary
 void process()
          Performs the processing required to fulfill the request.
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose, init
 

Field Detail

documents

public List<Document> documents

clusters

public List<Cluster> clusters

clusterCount

public int clusterCount
The number of clusters to create. The algorithm will create at most the specified number of clusters.

Attribute label:
Cluster count
Attribute level:
Basic
Attribute group:
Clusters

maxIterations

public int maxIterations
The maximum number of k-means iterations to perform.

Attribute label:
Maximum iterations
Attribute level:
Basic
Attribute group:
K-means

useDimensionalityReduction

public boolean useDimensionalityReduction
Use dimensionality reduction. If true, k-means will be applied on the dimensionality-reduced term-document matrix with the number of dimensions being equal to the number of requested clusters. If false, the k-means will be performed directly on the original term-document matrix.

Attribute label:
Use dimensionality reduction
Attribute level:
Basic
Attribute group:
K-means

partitionCount

public int partitionCount
Partition count. The number of partitions to create at each k-means clustering iteration.

Attribute label:
Partition count
Attribute level:
Basic
Attribute group:
K-means

labelCount

public int labelCount
Label count. The minimum number of labels to return for each cluster.

Attribute label:
Label count
Attribute level:
Basic
Attribute group:
Clusters

preprocessingPipeline

public final BasicPreprocessingPipeline preprocessingPipeline
Common preprocessing tasks handler, contains bindable attributes.


matrixBuilder

public final TermDocumentMatrixBuilder matrixBuilder
Term-document matrix builder for the algorithm, contains bindable attributes.


matrixReducer

public final TermDocumentMatrixReducer matrixReducer
Term-document matrix reducer for the algorithm, contains bindable attributes.


labelFormatter

public final LabelFormatter labelFormatter
Cluster label formatter, contains bindable attributes.

Constructor Detail

BisectingKMeansClusteringAlgorithm

public BisectingKMeansClusteringAlgorithm()
Method Detail

process

public void process()
             throws ProcessingException
Description copied from interface: IProcessingComponent
Performs the processing required to fulfill the request. This method is called once per request.

Specified by:
process in interface IProcessingComponent
Overrides:
process in class ProcessingComponentBase
Throws:
ProcessingException - when processing failed. If thrown, the IProcessingComponent.afterProcessing() method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.


Copyright (c) Dawid Weiss, Stanislaw Osinski