com.intel.analytics.zoo.feature.text

LocalTextSet

class LocalTextSet extends TextSet

Linear Supertypes
TextSet, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. LocalTextSet
  2. TextSet
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LocalTextSet(array: Array[TextFeature])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ->(transformer: Preprocessing[TextFeature, TextFeature]): TextSet

    Definition Classes
    TextSet
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. var array: Array[TextFeature]

  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def generateSample(): TextSet

    Generate BigDL Sample.

    Generate BigDL Sample. See TextFeatureToSample for more details.

    Definition Classes
    TextSet
  14. def generateWordIndexMap(removeTopN: Int = 0, maxWordsNum: Int = 1): Map[String, Int]

    Generate wordIndex map based on sorted word frequencies in descending order.

    Generate wordIndex map based on sorted word frequencies in descending order. Return the result map, which will also be stored in 'wordIndex'. Make sure you call this after tokenize. Otherwise you will get an exception. See word2idx for more details.

    Definition Classes
    LocalTextSetTextSet
  15. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  16. def getWordIndex: Map[String, Int]

    Get the word index map of this TextSet.

    Get the word index map of this TextSet. If the TextSet hasn't been transformed from word to index, null will be returned.

    Definition Classes
    TextSet
  17. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  18. def isDistributed: Boolean

    Whether it is a DistributedTextSet.

    Whether it is a DistributedTextSet.

    Definition Classes
    LocalTextSetTextSet
  19. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  20. def isLocal: Boolean

    Whether it is a LocalTextSet.

    Whether it is a LocalTextSet.

    Definition Classes
    LocalTextSetTextSet
  21. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  22. def normalize(): TextSet

    Do normalization on tokens.

    Do normalization on tokens. See Normalizer for more details.

    Definition Classes
    TextSet
  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. def randomSplit(weights: Array[Double]): Array[TextSet]

    Randomly split into array of TextSet with provided weights.

    Randomly split into array of TextSet with provided weights. Only available for DistributedTextSet for now.

    weights

    Array of Double indicating the split portions.

    Definition Classes
    LocalTextSetTextSet
  26. def setWordIndex(map: Map[String, Int]): LocalTextSet.this.type

    Definition Classes
    TextSet
  27. def shapeSequence(len: Int, truncMode: TruncMode = TruncMode.pre): TextSet

    Shape the sequence of tokens to a fixed length.

    Shape the sequence of tokens to a fixed length. Padding element will be "##". See SequenceShaper for more details.

    Definition Classes
    TextSet
  28. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  29. def toDataSet: DataSet[Sample[Float]]

    Convert TextSet to DataSet of Sample.

    Convert TextSet to DataSet of Sample.

    Definition Classes
    LocalTextSetTextSet
  30. def toDistributed(sc: SparkContext, partitionNum: Int = 4): DistributedTextSet

    Convert to a DistributedTextSet.

    Convert to a DistributedTextSet.

    Need to specify SparkContext to convert a LocalTextSet to a DistributedTextSet. In this case, you may also want to specify partitionNum, the default of which is 4.

    Definition Classes
    LocalTextSetTextSet
  31. def toLocal(): LocalTextSet

    Convert to a LocalTextSet.

    Convert to a LocalTextSet.

    Definition Classes
    LocalTextSetTextSet
  32. def toString(): String

    Definition Classes
    AnyRef → Any
  33. def tokenize(): TextSet

    Do tokenization on original text.

    Do tokenization on original text. See Tokenizer for more details.

    Definition Classes
    TextSet
  34. def transform(transformer: Preprocessing[TextFeature, TextFeature]): TextSet

    Transform from one TextSet to another.

    Transform from one TextSet to another.

    Definition Classes
    LocalTextSetTextSet
  35. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. def word2idx(removeTopN: Int = 0, maxWordsNum: Int = 1): TextSet

    Map word tokens to indices.

    Map word tokens to indices. Index will start from 1 and corresponds to the occurrence frequency of each word sorted in descending order. See WordIndexer for more details. After word2idx, you can get the wordIndex map by calling 'getWordIndex'.

    removeTopN

    Integer. Remove the topN words with highest frequencies in the case where those are treated as stopwords. Default is 0, namely remove nothing.

    maxWordsNum

    Integer. The maximum number of words to be taken into consideration. Default is -1, namely all words will be considered.

    Definition Classes
    TextSet

Inherited from TextSet

Inherited from AnyRef

Inherited from Any

Ungrouped