Object/Class

com.intel.analytics.zoo.feature.text

TextSet

Related Docs: class TextSet | package text

Permalink

object TextSet

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TextSet
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def array(data: Array[TextFeature]): LocalTextSet

    Permalink

    Create a LocalTextSet from array of TextFeature.

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. def fromRelationLists(relations: Array[Relation], corpus1: TextSet, corpus2: TextSet): LocalTextSet

    Permalink

    Generate a TextSet for ranking using Relation array.

    Generate a TextSet for ranking using Relation array.

    relations

    Array of Relation.

    corpus1

    LocalTextSet that contains all Relation.id1. For each TextFeature in corpus1, text must have been transformed to indexedTokens of the same length.

    corpus2

    LocalTextSet that contains all Relation.id2. For each TextFeature in corpus2, text must have been transformed to indexedTokens of the same length.

    returns

    LocalTextSet.

  11. def fromRelationLists(relations: RDD[Relation], corpus1: TextSet, corpus2: TextSet): DistributedTextSet

    Permalink

    Used to generate a TextSet for ranking.

    Used to generate a TextSet for ranking.

    This method does the following: 1. For each Relation.id1, find the list of Relation.id2 with corresponding Relation.label that comes together with Relation.id1. In other words, group relations by Relation.id1. 2. Join with corpus to transform each id to indexedTokens. Note: Make sure that the corpus has been transformed by SequenceShaper and WordIndexer. 3. For each list, generate a TextFeature having Sample with: - feature of shape (listLength, text1Length + text2Length). - label of shape (listLength, 1).

    relations

    RDD of Relation.

    corpus1

    DistributedTextSet that contains all Relation.id1. For each TextFeature in corpus1, text must have been transformed to indexedTokens of the same length.

    corpus2

    DistributedTextSet that contains all Relation.id2. For each TextFeature in corpus2, text must have been transformed to indexedTokens of the same length.

    returns

    DistributedTextSet.

  12. def fromRelationPairs(relations: Array[Relation], corpus1: TextSet, corpus2: TextSet): LocalTextSet

    Permalink

    Generate a TextSet for pairwise training using Relation array.

    Generate a TextSet for pairwise training using Relation array.

    relations

    Array of Relation.

    corpus1

    LocalTextSet that contains all Relation.id1. For each TextFeature in corpus1, text must have been transformed to indexedTokens of the same length.

    corpus2

    LocalTextSet that contains all Relation.id2. For each TextFeature in corpus2, text must have been transformed to indexedTokens of the same length.

    returns

    LocalTextSet.

  13. def fromRelationPairs(relations: RDD[Relation], corpus1: TextSet, corpus2: TextSet, memoryType: MemoryType = DRAM): DistributedTextSet

    Permalink

    Used to generate a TextSet for pairwise training.

    Used to generate a TextSet for pairwise training.

    This method does the following: 1. Generate all RelationPairs: (id1, id2Positive, id2Negative) from Relations. 2. Join RelationPairs with corpus to transform id to indexedTokens. Note: Make sure that the corpus has been transformed by SequenceShaper and WordIndexer. 3. For each pair, generate a TextFeature having Sample with: - feature of shape (2, text1Length + text2Length). - label of value [1 0] as the positive relation is placed before the negative one.

    relations

    RDD of Relation.

    corpus1

    DistributedTextSet that contains all Relation.id1. For each TextFeature in corpus1, text must have been transformed to indexedTokens of the same length.

    corpus2

    DistributedTextSet that contains all Relation.id2. For each TextFeature in corpus2, text must have been transformed to indexedTokens of the same length.

    returns

    DistributedTextSet.

  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. val logger: Logger

    Permalink
  18. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. def rdd(data: RDD[TextFeature], memoryType: MemoryType = DRAM): DistributedTextSet

    Permalink

    Create a DistributedTextSet from RDD of TextFeature.

  22. def read(path: String, sc: SparkContext = null, minPartitions: Int = 1): TextSet

    Permalink

    Read text files with labels from a directory.

    Read text files with labels from a directory.

    The directory structure is expected to be the following: path ├── dir1 - text1, text2, ... ├── dir2 - text1, text2, ... └── dir3 - text1, text2, ... Under the target path, there ought to be N subdirectories (dir1 to dirN). Each subdirectory represents a category and contains all texts that belong to such category. Each category will be a given a label according to its position in the ascending order sorted among all subdirectories. All texts will be given a label according to the subdirectory where it is located. Labels start from 0.

    path

    The folder path to texts. Local or distributed file system (such as HDFS) are supported. If you want to read from a distributed file system, sc needs to be specified.

    sc

    An instance of SparkContext. If specified, texts will be read as a DistributedTextSet. Default is null and in this case texts will be read as a LocalTextSet.

    minPartitions

    Integer. A suggestion value of the minimal partition number for input texts. Only need to specify this when sc is not null. Default is 1.

    returns

    TextSet.

  23. def readCSV(path: String, sc: SparkContext = null, minPartitions: Int = 1): TextSet

    Permalink

    Read texts with id from csv file.

    Read texts with id from csv file. Each record is supposed to contain the following two fields in order: id(String) and text(String).

    path

    The path to the csv file. Local or distributed file system (such as HDFS) are supported. If you want to read from a distributed file system, sc needs to be specified.

    sc

    An instance of SparkContext. If specified, texts will be read as a DistributedTextSet. Default is null and in this case texts will be read as a LocalTextSet.

    minPartitions

    Integer. A suggestion value of the minimal partition number for input texts. Only need to specify this when sc is not null. Default is 1.

    returns

    TextSet.

  24. def readParquet(path: String, sqlContext: SQLContext): DistributedTextSet

    Permalink

    Read texts with id from parquet file.

    Read texts with id from parquet file. Schema should be the following: "id"(String) and "text"(String).

    path

    The path to the parquet file.

    sqlContext

    An instance of SQLContext.

    returns

    DistributedTextSet.

  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  26. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. def wordsToMap(words: Array[String], existingMap: Map[String, Int] = null): Map[String, Int]

    Permalink

    Assign each word an index to form a map.

    Assign each word an index to form a map.

    words

    Array of words.

    existingMap

    Existing map of word index if any. Default is null and in this case a new map with index starting from 1 will be generated. If not null, then the generated map will preserve the word index in existingMap and assign subsequent indices to new words.

    returns

    wordIndex map.

Inherited from AnyRef

Inherited from Any

Ungrouped