org.carrot2.clustering.synthetic
Class ByUrlClusteringAlgorithm

java.lang.Object
  extended by org.carrot2.core.ProcessingComponentBase
      extended by org.carrot2.clustering.synthetic.ByUrlClusteringAlgorithm
All Implemented Interfaces:
IClusteringAlgorithm, IProcessingComponent

public class ByUrlClusteringAlgorithm
extends ProcessingComponentBase
implements IClusteringAlgorithm

Hierarchically clusters documents according to their content URLs. Document.CONTENT_URL property will be used to obtain a document's URL.

Groups at the top level of the hierarchy will correspond to the last segments of the URLs, usually domain suffixes, such as ".com" or ".co.uk". Subgroups will be created based on further segments of the URLs, very often domains subdomains, e.g. "yahoo.com", "bbc.co.uk" and then e.g. "mail.yahoo.com", "news.yahoo.com". The "www" segment of the URLs will be ignored.

Clusters will be ordered by size (number of documents) descendingly; in case of equal sizes, alphabetically by URL, see Cluster.BY_REVERSED_SIZE_AND_LABEL_COMPARATOR.

Attribute label:
By URL Clustering

Field Summary
 List<Cluster> clusters
          Clusters created by the algorithm.
 List<Document> documents
          Documents to cluster.
 
Constructor Summary
ByUrlClusteringAlgorithm()
           
 
Method Summary
 void process()
          Performs by URL clustering.
 
Methods inherited from class org.carrot2.core.ProcessingComponentBase
afterProcessing, beforeProcessing, dispose, getContext, getSharedExecutor, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.core.IProcessingComponent
afterProcessing, beforeProcessing, dispose, init
 

Field Detail

documents

public List<Document> documents
Documents to cluster.


clusters

public List<Cluster> clusters
Clusters created by the algorithm.

Constructor Detail

ByUrlClusteringAlgorithm

public ByUrlClusteringAlgorithm()
Method Detail

process

public void process()
             throws ProcessingException
Performs by URL clustering.

Specified by:
process in interface IProcessingComponent
Overrides:
process in class ProcessingComponentBase
Throws:
ProcessingException - when processing failed. If thrown, the IProcessingComponent.afterProcessing() method will be called and the component will be ready to accept further requests or to be disposed of. Finally, the exception will be rethrown from the controller method that caused the component to perform processing.


Copyright (c) Dawid Weiss, Stanislaw Osinski