weka.core.stemmers
Class SnowballStemmer

java.lang.Object
  extended by weka.core.stemmers.SnowballStemmer
All Implemented Interfaces:
java.io.Serializable, OptionHandler, RevisionHandler, Stemmer

public class SnowballStemmer
extends java.lang.Object
implements Stemmer, OptionHandler

A wrapper class for the Snowball stemmers. Only available if the Snowball classes are in the classpath.
If the class discovery is not dynamic, i.e., the property 'UseDynamic' in the props file 'weka/gui/GenericPropertiesCreator.props' is 'false', then the property 'org.tartarus.snowball.SnowballProgram' in the 'weka/gui/GenericObjectEditor.props' file has to be uncommented as well. If necessary you have to discover and fill in the snowball stemmers manually. You can use the 'weka.core.ClassDiscovery' for this:
java weka.core.ClassDiscovery org.tartarus.snowball.SnowballProgram org.tartarus.snowball.ext

For more information visit these web sites:
http://weka.wikispaces.com/Stemmers
http://snowball.tartarus.org/

Valid options are:

 -S <name>
  The name of the snowball stemmer (default 'porter').
  available stemmers:
     danish, dutch, english, finnish, french, german, italian, 
     norwegian, porter, portuguese, russian, spanish, swedish
 

Version:
$Revision: 5953 $
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Field Summary
static java.lang.String PACKAGE
          the package name for snowball.
static java.lang.String PACKAGE_EXT
          the package name where the stemmers are located.
 
Constructor Summary
SnowballStemmer()
          initializes the stemmer ("porter").
SnowballStemmer(java.lang.String name)
          initializes the stemmer with the given stemmer.
 
Method Summary
 java.lang.String[] getOptions()
          Gets the current settings of the classifier.
 java.lang.String getRevision()
          Returns the revision string.
 java.lang.String getStemmer()
          returns the name of the current stemmer, null if none is set.
 java.lang.String globalInfo()
          Returns a string describing the stemmer.
static boolean isPresent()
          returns whether Snowball is present or not, i.e.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static java.util.Enumeration listStemmers()
          returns an enumeration over all currently stored stemmer names.
static void main(java.lang.String[] args)
          Runs the stemmer with the given options.
 void setOptions(java.lang.String[] options)
          Parses the options.
 void setStemmer(java.lang.String name)
          sets the stemmer with the given name, e.g., "porter".
 java.lang.String stem(java.lang.String word)
          Returns the word in its stemmed form.
 java.lang.String stemmerTipText()
          Returns the tip text for this property.
 java.lang.String toString()
          returns a string representation of the stemmer.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PACKAGE

public static final java.lang.String PACKAGE
the package name for snowball.

See Also:
Constant Field Values

PACKAGE_EXT

public static final java.lang.String PACKAGE_EXT
the package name where the stemmers are located.

See Also:
Constant Field Values
Constructor Detail

SnowballStemmer

public SnowballStemmer()
initializes the stemmer ("porter").


SnowballStemmer

public SnowballStemmer(java.lang.String name)
initializes the stemmer with the given stemmer.

Parameters:
name - the name of the stemmer
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing the stemmer.

Returns:
a description suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses the options.

Valid options are:

 -S <name>
  The name of the snowball stemmer (default 'porter').
  available stemmers:
     danish, dutch, english, finnish, french, german, italian, 
     norwegian, porter, portuguese, russian, spanish, swedish
 

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the options to parse
Throws:
java.lang.Exception - if parsing fails

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

isPresent

public static boolean isPresent()
returns whether Snowball is present or not, i.e. whether the classes are in the classpath or not

Returns:
whether Snowball is available

listStemmers

public static java.util.Enumeration listStemmers()
returns an enumeration over all currently stored stemmer names.

Returns:
all available stemmers

getStemmer

public java.lang.String getStemmer()
returns the name of the current stemmer, null if none is set.

Returns:
the name of the stemmer

setStemmer

public void setStemmer(java.lang.String name)
sets the stemmer with the given name, e.g., "porter".

Parameters:
name - the name of the stemmer, e.g., "porter"

stemmerTipText

public java.lang.String stemmerTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

stem

public java.lang.String stem(java.lang.String word)
Returns the word in its stemmed form.

Specified by:
stem in interface Stemmer
Parameters:
word - the unstemmed word
Returns:
the stemmed word

toString

public java.lang.String toString()
returns a string representation of the stemmer.

Overrides:
toString in class java.lang.Object
Returns:
a string representation of the stemmer

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Returns:
the revision

main

public static void main(java.lang.String[] args)
Runs the stemmer with the given options.

Parameters:
args - the options