Package

com.intel.analytics.zoo.feature

text

Permalink

package text

Visibility
  1. Public
  2. All

Type Members

  1. class DistributedTextSet extends TextSet

    Permalink

    DistributedTextSet is comprised of RDD of TextFeature.

  2. class LocalTextSet extends TextSet

    Permalink

    LocalTextSet is comprised of array of TextFeature.

  3. class Normalizer extends TextTransformer

    Permalink

    Removes all dirty (non English alphabet) characters from tokens and converts words to lower case.

    Removes all dirty (non English alphabet) characters from tokens and converts words to lower case. Need to tokenize first. Input key: TextFeature.tokens Output key: TextFeature.tokens In this case, original tokens will be replaced by normalized tokens.

  4. class SequenceShaper extends TextTransformer

    Permalink

    Shape the sequence of indices to a fixed length.

    Shape the sequence of indices to a fixed length. If the original sequence is longer than the target length, it will be truncated from the beginning or the end. If the original sequence is shorter than the target length, it will be padded to the end. Need to word2idx first. Input key: TextFeature.indexedTokens Output key: TextFeature.indexedTokens The original indices sequence will be replaced by the shaped sequence.

  5. class TextFeature extends Serializable

    Permalink

    Each TextFeature keeps information of a single text record.

    Each TextFeature keeps information of a single text record. It can include various status (if any) of a text, e.g. original text content, uri, category label, tokens, index representation of tokens, BigDL Sample representation, prediction result and so on. It uses a HashMap to store all these data. Each key is a string that can be used to identify the corresponding value.

  6. class TextFeatureToSample extends TextTransformer

    Permalink

    Transform indexedTokens and label (if any) of a TextFeature to a BigDL Sample.

    Transform indexedTokens and label (if any) of a TextFeature to a BigDL Sample. Need to word2idx first. Input key: TextFeature.indexedTokens and TextFeature.label (if any) Output key: TextFeature.sample

  7. class TextPredictor[T] extends Serializable

    Permalink
  8. abstract class TextSet extends AnyRef

    Permalink

    TextSet wraps a set of TextFeature.

  9. abstract class TextTransformer extends Preprocessing[TextFeature, TextFeature]

    Permalink

    Base class of Transformers that transform TextFeature.

  10. class Tokenizer extends TextTransformer

    Permalink

    Transform text to array of string tokens.

    Transform text to array of string tokens. Input key: TextFeature.text Output key: TextFeature.tokens

  11. class WordIndexer extends TextTransformer

    Permalink

    Given a wordIndex map, transform tokens to corresponding indices.

    Given a wordIndex map, transform tokens to corresponding indices. Those words not in the map will be aborted. Need to tokenize first. Input key: TextFeature.tokens Output key: TextFeature.indexedTokens

Value Members

  1. object Normalizer extends Serializable

    Permalink
  2. object SequenceShaper extends Serializable

    Permalink
  3. object TextFeature extends Serializable

    Permalink
  4. object TextFeatureToSample extends Serializable

    Permalink
  5. object TextPredictor extends Serializable

    Permalink
  6. object TextSet

    Permalink
  7. object Tokenizer extends Serializable

    Permalink
  8. object TruncMode extends Enumeration

    Permalink
  9. object WordIndexer extends Serializable

    Permalink

Ungrouped