org.carrot2.text.linguistic
Class DefaultLexicalDataFactory

java.lang.Object
  extended by org.carrot2.text.linguistic.DefaultLexicalDataFactory
All Implemented Interfaces:
ILexicalDataFactory

public class DefaultLexicalDataFactory
extends Object
implements ILexicalDataFactory

The default management of lexical resources. Resources are read from disk, cached and shared between all threads using this class. Additional attributes control resource reloading and merging: resourceLookup, reloadResources, mergeResources.


Field Summary
 boolean mergeResources
          Merges stop words and stop labels from all known languages.
 boolean reloadResources
           
 ResourceLookup resourceLookup
           
 
Constructor Summary
DefaultLexicalDataFactory()
           
 
Method Summary
 ILexicalData getLexicalData(LanguageCode languageCode)
          The main logic for acquiring a shared ILexicalData instance.
static HashSet<String> load(IResource resource)
          Loads words from a given IResource (UTF-8, one word per line, #-starting lines are considered comments).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reloadResources

public boolean reloadResources

mergeResources

public boolean mergeResources
Merges stop words and stop labels from all known languages. If set to false, only stop words and stop labels of the active language will be used. If set to true, stop words from all LanguageCodes will be used together and stop labels from all languages will be used together, no matter the active language. Lexical resource merging is useful when clustering data in a mix of different languages and should increase clustering quality in such settings.

Attribute label:
Merge lexical resources
Attribute level:
Medium
Attribute group:
Preprocessing

resourceLookup

public ResourceLookup resourceLookup
Constructor Detail

DefaultLexicalDataFactory

public DefaultLexicalDataFactory()
Method Detail

getLexicalData

public ILexicalData getLexicalData(LanguageCode languageCode)
The main logic for acquiring a shared ILexicalData instance.

Specified by:
getLexicalData in interface ILexicalDataFactory

load

public static HashSet<String> load(IResource resource)
                            throws IOException
Loads words from a given IResource (UTF-8, one word per line, #-starting lines are considered comments).

Throws:
IOException


Copyright (c) Dawid Weiss, Stanislaw Osinski