org.carrot2.text.linguistic
Class DefaultLanguageModelFactory

java.lang.Object
  extended by org.carrot2.text.linguistic.DefaultLanguageModelFactory
All Implemented Interfaces:
ILanguageModelFactory

public final class DefaultLanguageModelFactory
extends Object
implements ILanguageModelFactory

Accessor to all ILanguageModel objects. Default implementation provides support for all languages listed in LanguageCode, but certain languages may require additional resources (such as external stemming libraries). Refer to each language's constant for specific information.

See Also:
LanguageCode

Field Summary
 boolean mergeResources
          Merges stop words and stop labels from all known languages.
 boolean reloadResources
          Reloads cached stop words and stop labels on every processing request.
 String resourcePath
          Lexical resources path.
 
Constructor Summary
DefaultLanguageModelFactory()
           
 
Method Summary
 ILanguageModel getLanguageModel(LanguageCode language)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

resourcePath

public String resourcePath
Lexical resources path. A path within the classpath to load lexical resources from. For example, if resource path is /my/custom/resources, stopwords for English will be loaded from /my/custom/resources/stopwords.en. Other lexical resources and other languages will be loaded in the same way.

Attribute label:
Lexical resources path
Attribute level:
Advanced
Attribute group:
Preprocessing

reloadResources

public boolean reloadResources
Reloads cached stop words and stop labels on every processing request. For best performance, lexical resource reloading should be disabled in production.

Attribute label:
Reload lexical resources
Attribute level:
Medium
Attribute group:
Preprocessing

mergeResources

public boolean mergeResources
Merges stop words and stop labels from all known languages. If set to false, only stop words and stop labels of the active language will be used. If set to true, stop words from all LanguageCodes will be used together and stop labels from all languages will be used together, no matter the active language. Lexical resource merging is useful when clustering data in a mix of different languages and should increase clustering quality in such settings.

Attribute label:
Merge lexical resources
Attribute level:
Medium
Attribute group:
Preprocessing
Constructor Detail

DefaultLanguageModelFactory

public DefaultLanguageModelFactory()
Method Detail

getLanguageModel

public ILanguageModel getLanguageModel(LanguageCode language)
Specified by:
getLanguageModel in interface ILanguageModelFactory
Returns:
Return a language model for one of the languages in LanguageCode.


Copyright (c) Dawid Weiss, Stanislaw Osinski