text

Type Members

class DistributedTextSet extends TextSet

DistributedTextSet is comprised of RDD of TextFeature.
class LocalTextSet extends TextSet

LocalTextSet is comprised of array of TextFeature.
class Normalizer extends TextTransformer

Removes all dirty (non English alphabet) characters from tokens and converts words to lower case.
Removes all dirty (non English alphabet) characters from tokens and converts words to lower case. Need to tokenize first. Input key: TextFeature.tokens Output key: TextFeature.tokens In this case, original tokens will be replaced by normalized tokens.
class SequenceShaper extends TextTransformer

Shape the sequence of indices to a fixed length.
Shape the sequence of indices to a fixed length. If the original sequence is longer than the target length, it will be truncated from the beginning or the end. If the original sequence is shorter than the target length, it will be padded to the end. Need to word2idx first. Input key: TextFeature.indexedTokens Output key: TextFeature.indexedTokens The original indices sequence will be replaced by the shaped sequence.
class TextFeature extends Serializable

Each TextFeature keeps information of a single text record.
Each TextFeature keeps information of a single text record. It can include various status (if any) of a text, e.g. original text content, uri, category label, tokens, index representation of tokens, BigDL Sample representation, prediction result and so on. It uses a HashMap to store all these data. Each key is a string that can be used to identify the corresponding value.
class TextFeatureToSample extends TextTransformer

Transform indexedTokens and label (if any) of a TextFeature to a BigDL Sample.
Transform indexedTokens and label (if any) of a TextFeature to a BigDL Sample. Need to word2idx first. Input key: TextFeature.indexedTokens and TextFeature.label (if any) Output key: TextFeature.sample
class TextPredictor[T] extends Serializable
abstract class TextSet extends AnyRef

TextSet wraps a set of TextFeature.
abstract class TextTransformer extends Preprocessing[TextFeature, TextFeature]

Base class of Transformers that transform TextFeature.
class Tokenizer extends TextTransformer

Transform text to array of string tokens.
Transform text to array of string tokens. Input key: TextFeature.text Output key: TextFeature.tokens
class WordIndexer extends TextTransformer

Given a wordIndex map, transform tokens to corresponding indices.
Given a wordIndex map, transform tokens to corresponding indices. Those words not in the map will be aborted. Need to tokenize first. Input key: TextFeature.tokens Output key: TextFeature.indexedTokens

Value Members

object Normalizer extends Serializable
object SequenceShaper extends Serializable
object TextFeature extends Serializable
object TextFeatureToSample extends Serializable
object TextPredictor extends Serializable
object TextSet
object Tokenizer extends Serializable
object TruncMode extends Enumeration
object WordIndexer extends Serializable

package text

Type Members

class DistributedTextSet extends TextSet

class LocalTextSet extends TextSet

class Normalizer extends TextTransformer

class SequenceShaper extends TextTransformer

class TextFeature extends Serializable

class TextFeatureToSample extends TextTransformer

class TextPredictor[T] extends Serializable

abstract class TextSet extends AnyRef

abstract class TextTransformer extends Preprocessing[TextFeature, TextFeature]

class Tokenizer extends TextTransformer

class WordIndexer extends TextTransformer

Value Members

object Normalizer extends Serializable

object SequenceShaper extends Serializable

object TextFeature extends Serializable

object TextFeatureToSample extends Serializable

object TextPredictor extends Serializable

object TextSet

object Tokenizer extends Serializable

object TruncMode extends Enumeration

object WordIndexer extends Serializable

Ungrouped