|
Carrot2 v3.3.0
API Documentation |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.carrot2.core.Cluster
public final class Cluster
A cluster (group) of Documents. Each cluster has a human-readable label
consisting of one or more phrases, a list of documents it contains and a list of its
subclusters. Optionally, additional attributes can be associated with a cluster, e.g.
OTHER_TOPICS. This class is not thread-safe.
| Field Summary | |
|---|---|
static Comparator<Cluster> |
BY_LABEL_COMPARATOR
Compares clusters by the natural order of their labels as returned by getLabel(). |
static Comparator<Cluster> |
BY_REVERSED_SCORE_AND_LABEL_COMPARATOR
Compares clusters first by their size as returned by SCORE and labels as
returned by getLabel(). |
static Comparator<Cluster> |
BY_REVERSED_SIZE_AND_LABEL_COMPARATOR
Compares clusters first by their size as returned by size() and labels as
returned by getLabel(). |
static Comparator<Cluster> |
BY_SCORE_COMPARATOR
Compares clusters by score as returned by SCORE. |
static Comparator<Cluster> |
BY_SIZE_COMPARATOR
Compares clusters by size as returned by size(). |
static String |
OTHER_TOPICS
Indicates that the cluster is an Other Topics cluster. |
static Comparator<Cluster> |
OTHER_TOPICS_AT_THE_END
A comparator that puts OTHER_TOPICS clusters at the end of the list. |
static String |
SCORE
Score of this cluster that indicates the clustering algorithm's beliefs on the quality of this cluster. |
| Constructor Summary | |
|---|---|
Cluster()
Creates a Cluster with an empty label, no documents and no subclusters. |
|
Cluster(String phrase,
Document... documents)
Creates a Cluster with the provided phrase to be used as the
cluster's label and documents contained in the cluster. |
|
| Method Summary | ||
|---|---|---|
Cluster |
addDocuments(Document... documents)
Adds document to this cluster. |
|
Cluster |
addDocuments(Iterable<Document> documents)
Adds document to this cluster. |
|
Cluster |
addPhrases(Iterable<String> phrases)
Adds phrases to the description of this cluster. |
|
Cluster |
addPhrases(String... phrases)
Adds phrases to the description of this cluster. |
|
Cluster |
addSubclusters(Cluster... subclusters)
Adds subclusters to this cluster |
|
Cluster |
addSubclusters(Iterable<Cluster> clusters)
Adds subclusters to this cluster |
|
static void |
appendOtherTopics(List<Document> allDocuments,
List<Cluster> clusters)
If there are unclustered documents, appends the "Other Topics" group to the clusters. |
|
static void |
appendOtherTopics(List<Document> allDocuments,
List<Cluster> clusters,
String label)
If there are unclustered documents, appends the "Other Topics" group to the clusters. |
|
static void |
assignClusterIds(Collection<Cluster> clusters)
Assigns sequential identifiers to the provided clusters (and their
sub-clusters). |
|
static Cluster |
buildOtherTopics(List<Document> allDocuments,
List<Cluster> clusters)
Builds an "Other Topics" cluster that groups those documents from allDocument that were not referenced in any cluster in
clusters. |
|
static Cluster |
buildOtherTopics(List<Document> allDocuments,
List<Cluster> clusters,
String label)
Builds an "Other Topics" cluster that groups those documents from allDocument that were not referenced in any cluster in
clusters. |
|
static Comparator<Cluster> |
byReversedWeightedScoreAndSizeComparator(double scoreWeight)
Returns a comparator that compares clusters based on the aggregation of their size and score. |
|
static Cluster |
find(int id,
Collection<Cluster> clusters)
Locate the first cluster that has id equal to id. |
|
List<Document> |
getAllDocuments()
Returns all documents contained in this cluster and (recursively) all documents from this cluster's subclusters. |
|
List<Document> |
getAllDocuments(Comparator<Document> comparator)
Returns all documents in this cluster ordered according to the provided comparator. |
|
|
getAttribute(String key)
Returns the attribute associated with this cluster under the provided key. |
|
Map<String,Object> |
getAttributes()
Returns all attributes of this cluster. |
|
List<Document> |
getDocuments()
Returns all documents contained in this cluster. |
|
Integer |
getId()
Internal identifier of this cluster within the ProcessingResult. |
|
String |
getLabel()
Formats this cluster's label. |
|
List<String> |
getPhrases()
Returns all phrases describing this cluster. |
|
Double |
getScore()
Returns this cluster's "score" field. |
|
List<Cluster> |
getSubclusters()
Returns all subclusters of this cluster. |
|
boolean |
isOtherTopics()
Returns true if this cluster is the OTHER_TOPICS cluster. |
|
|
setAttribute(String key,
T value)
Associates an attribute with this cluster. |
|
Cluster |
setOtherTopics(boolean isOtherTopics)
Sets the OTHER_TOPICS attribute of this cluster. |
|
Cluster |
setScore(Double score)
Sets this cluster's SCORE field. |
|
int |
size()
Returns the size of the cluster calculated as the number of unique documents it contains, including its subclusters. |
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String OTHER_TOPICS
Type of this attribute is Boolean.
setAttribute(String, Object),
getAttribute(String),
Constant Field Valuespublic static final String SCORE
Type of this attribute is Double.
setAttribute(String, Object),
getAttribute(String),
Constant Field Valuespublic static final Comparator<Cluster> BY_SIZE_COMPARATOR
size(). Clusters with more
documents are larger.
public static final Comparator<Cluster> BY_SCORE_COMPARATOR
SCORE. Clusters with larger
score are larger.
public static final Comparator<Cluster> BY_LABEL_COMPARATOR
getLabel().
public static final Comparator<Cluster> BY_REVERSED_SIZE_AND_LABEL_COMPARATOR
size() and labels as
returned by getLabel(). In case of equal sizes, natural order of the
labels decides.
Please note: this is a reversed comparator, so "larger" clusters end up nearer the beginning of the list being sorted (which is usually the order in which the applications want to display clusters).
public static final Comparator<Cluster> BY_REVERSED_SCORE_AND_LABEL_COMPARATOR
SCORE and labels as
returned by getLabel(). In case of equal scores, natural order of the
labels decides.
Please note: this is a reversed comparator, so "larger" clusters end up nearer the beginning of the list being sorted (which is usually the order in which the applications want to display clusters).
public static final Comparator<Cluster> OTHER_TOPICS_AT_THE_END
OTHER_TOPICS clusters at the end of the list. In
other words, to this comparator an OTHER_TOPICS topics cluster is "bigger"
than a non-{OTHER_TOPICS cluster.
Note: This comparator is designed for use in combination with
other comparators, such as BY_REVERSED_SIZE_AND_LABEL_COMPARATOR. If you
only need to partition a list of clusters into regular and other topic ones, this
is better done in linear time without resorting to Collections.sort(List).
| Constructor Detail |
|---|
public Cluster()
Cluster with an empty label, no documents and no subclusters.
public Cluster(String phrase,
Document... documents)
Cluster with the provided phrase to be used as the
cluster's label and documents contained in the cluster.
phrase - the phrase to form the cluster's labeldocuments - documents contained in the cluster| Method Detail |
|---|
public String getLabel()
getPhrases().
public List<String> getPhrases()
public List<Cluster> getSubclusters()
public List<Document> getDocuments()
public List<Document> getAllDocuments()
getDocuments() and then
documents from subclusters.
public List<Document> getAllDocuments(Comparator<Document> comparator)
Document for common comparators, e.g. Document.BY_ID_COMPARATOR
.
public Cluster addPhrases(String... phrases)
phrases - to be added to the description of this cluster
public Cluster addPhrases(Iterable<String> phrases)
phrases - to be added to the description of this cluster
public Cluster addDocuments(Document... documents)
documents - to be added to this cluster
public Cluster addDocuments(Iterable<Document> documents)
documents - to be added to this cluster
public Cluster addSubclusters(Cluster... subclusters)
subclusters - to be added to this cluster
public Cluster addSubclusters(Iterable<Cluster> clusters)
clusters - to be added to this cluster
public Double getScore()
public Cluster setScore(Double score)
SCORE field.
score - score to set
public <T> T getAttribute(String key)
key. If there is no attribute under the provided key,
null will be returned.
key - of the attribute
null
public <T> Cluster setAttribute(String key,
T value)
key - for the attributevalue - for the attribute
public Map<String,Object> getAttributes()
public int size()
public Integer getId()
ProcessingResult. This
identifier is assigned dynamically after clusters are passed to
ProcessingResult.
ProcessingResultpublic boolean isOtherTopics()
true if this cluster is the OTHER_TOPICS cluster.
public Cluster setOtherTopics(boolean isOtherTopics)
OTHER_TOPICS attribute of this cluster.
isOtherTopics - if true, this cluster will be marked as an
Other Topics cluster.
public static Comparator<Cluster> byReversedWeightedScoreAndSizeComparator(double scoreWeight)
scoreWeight is 0.0, the order depends only on cluster
sizes. If scoreWeight is 1.1, the order depends only on cluster
scores. For scoreWeight values between 0.0 and 1.0, the higher the
scoreWeight, the more contribution of cluster scores to the order. In
case of a tie on the aggregated cluster size and score, clusters are compared by
the natural order of their labels.
Please note: this is a reversed comparator, so "larger" clusters end up nearer the beginning of the list being sorted (which is usually the order in which the applications want to display clusters).
public static void assignClusterIds(Collection<Cluster> clusters)
clusters (and their
sub-clusters). If a cluster already has an identifier, the identifier will not be
changed.
clusters - Clusters to assign identifiers to.
IllegalArgumentException - if the provided clusters contain non-unique
identifiers
public static Cluster find(int id,
Collection<Cluster> clusters)
id. The search includes
all the clusters in the input and their sub-clusters. The first cluster with
matching identifier is returned or null if no such cluster could be
found.
public static Cluster buildOtherTopics(List<Document> allDocuments,
List<Cluster> clusters)
allDocument that were not referenced in any cluster in
clusters.
allDocuments - all documents to check againstclusters - list of clusters with assigned documents
public static Cluster buildOtherTopics(List<Document> allDocuments,
List<Cluster> clusters,
String label)
allDocument that were not referenced in any cluster in
clusters.
allDocuments - all documents to check againstclusters - list of clusters with assigned documentslabel - label for the "Other Topics" group
public static void appendOtherTopics(List<Document> allDocuments,
List<Cluster> clusters)
clusters.
buildOtherTopics(List, List)
public static void appendOtherTopics(List<Document> allDocuments,
List<Cluster> clusters,
String label)
clusters.
buildOtherTopics(List, List, String)
|
Please refer to project documentation at
http://project.carrot2.org |
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||