org.carrot2.clustering.lingo
Class UniqueLabelAssigner

java.lang.Object
  extended by org.carrot2.clustering.lingo.UniqueLabelAssigner
All Implemented Interfaces:
ILabelAssigner

public class UniqueLabelAssigner
extends Object
implements ILabelAssigner

Assigns unique labels to each base vector using a greedy algorithm. For each base vector chooses the label that maximizes the base vector--label term vector cosine similarity and has not been previously selected. Once a label is selected, it will not be used to label any other vector. This algorithm does not create duplicate cluster labels, which usually means that this assignment method will create more clusters than SimpleLabelAssigner. This method is slightly slower than SimpleLabelAssigner.


Constructor Summary
UniqueLabelAssigner()
           
 
Method Summary
 void assignLabels(LingoProcessingContext context, org.apache.mahout.math.matrix.DoubleMatrix2D stemCos, com.carrotsearch.hppc.IntIntOpenHashMap filteredRowToStemIndex, org.apache.mahout.math.matrix.DoubleMatrix2D phraseCos)
          Assigns labels to base vectors found by the matrix factorization.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UniqueLabelAssigner

public UniqueLabelAssigner()
Method Detail

assignLabels

public void assignLabels(LingoProcessingContext context,
                         org.apache.mahout.math.matrix.DoubleMatrix2D stemCos,
                         com.carrotsearch.hppc.IntIntOpenHashMap filteredRowToStemIndex,
                         org.apache.mahout.math.matrix.DoubleMatrix2D phraseCos)
Description copied from interface: ILabelAssigner
Assigns labels to base vectors found by the matrix factorization. The results must be stored in the LingoProcessingContext.clusterLabelFeatureIndex and LingoProcessingContext.clusterLabelScore arrays.

Specified by:
assignLabels in interface ILabelAssigner
Parameters:
context - contains all information about the current clustering request
stemCos - base vector -- single stems cosine matrix
filteredRowToStemIndex - mapping between row indices of stemCos and indices of stems in PreprocessingContext.allStems
phraseCos - base vector -- phrase cosine matrix


Copyright (c) Dawid Weiss, Stanislaw Osinski