Object/Class

org.zouzias.spark.lucenerdd

LuceneRDD

Related Docs: class LuceneRDD | package lucenerdd

Permalink

object LuceneRDD extends Versionable with AnalyzerConfigurable with SimilarityConfigurable

Linear Supertypes
SimilarityConfigurable, AnalyzerConfigurable, Configurable, Serializable, Serializable, Versionable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LuceneRDD
  2. SimilarityConfigurable
  3. AnalyzerConfigurable
  4. Configurable
  5. Serializable
  6. Serializable
  7. Versionable
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. lazy val Config: Config

    Permalink
    Definition Classes
    Configurable
  5. val IndexAnalyzerConfigName: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  6. val LuceneSimilarityConfigValue: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  7. val QueryAnalyzerConfigName: Option[String]

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  8. def apply(dataFrame: DataFrame): LuceneRDD[Row]

    Permalink

    Constructor with default index, query analyzers and Lucene similarity

    Constructor with default index, query analyzers and Lucene similarity

    dataFrame

    Input DataFrame

  9. def apply(dataFrame: DataFrame, indexAnalyzer: String, queryAnalyzer: String, similarity: String): LuceneRDD[Row]

    Permalink

    Instantiate a LuceneRDD with DataFrame

    Instantiate a LuceneRDD with DataFrame

    dataFrame

    Spark DataFrame

    indexAnalyzer

    Index Analyzer name

    queryAnalyzer

    Query Analyzer name

    similarity

    Lucene scoring similarity, i.e., BM25 or TF-IDF

  10. def apply[T](elems: Iterable[T])(implicit arg0: ClassTag[T], sc: SparkContext, conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink
  11. def apply[T](elems: Iterable[T], indexAnalyzer: String, queryAnalyzer: String, similarity: String)(implicit arg0: ClassTag[T], sc: SparkContext, conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink

    Instantiate a LuceneRDD with an iterable

    Instantiate a LuceneRDD with an iterable

    T

    Input type

    elems

    Elements to index

    indexAnalyzer

    Index Analyzer name

    queryAnalyzer

    Query Analyzer name

    similarity

    Lucene scoring similarity, i.e., BM25 or TF-IDF

    sc

    Spark Context

  12. def apply[T](elems: RDD[T])(implicit arg0: ClassTag[T], conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink
  13. def apply[T](elems: RDD[T], indexAnalyzer: String, queryAnalyzer: String, similarity: String)(implicit arg0: ClassTag[T], conv: (T) ⇒ Document): LuceneRDD[T]

    Permalink

    Instantiate a LuceneRDD given an RDD[T]

    Instantiate a LuceneRDD given an RDD[T]

    T

    Generic type

    elems

    RDD of type T

    indexAnalyzer

    Index Analyzer name

    queryAnalyzer

    Query Analyzer name

    similarity

    Lucene scoring similarity, i.e., BM25 or TF-IDF

  14. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  15. def blockDedup(entities: DataFrame, rowToQueryString: (Row) ⇒ String, blockingColumns: Array[String], topK: Int = 3, indexAnalyzer: String = ..., queryAnalyzer: String = ..., similarity: String = getOrElseClassic()): RDD[(Row, Array[SparkScoreDoc])]

    Permalink

    Deduplication via blocking

    Deduplication via blocking

    entities

    Entities DataFrame to deduplicate

    rowToQueryString

    Function that maps Row to Lucene Query String

    blockingColumns

    Columns on which exact match is required

    topK

    Number of top-K query results

    indexAnalyzer

    Lucene analyzer at index time

    queryAnalyzer

    Lucene analyzer at query time

    similarity

    Lucene Similarity metric (BM25, Tf/idf)

  16. def blockEntityLinkage(queries: DataFrame, entities: DataFrame, rowToQueryString: (Row) ⇒ String, queryPartColumns: Array[String], entityPartColumns: Array[String], topK: Int = 3, indexAnalyzer: String = ..., queryAnalyzer: String = ..., similarity: String = getOrElseClassic()): RDD[(Row, Array[SparkScoreDoc])]

    Permalink

    Entity linkage between two DataFrame by blocking / filtering on one or more columns.

    Entity linkage between two DataFrame by blocking / filtering on one or more columns.

    queries

    Queries / entities to be linked with @corpus

    entities

    DataFrame of entities to be linked with queries parameter

    rowToQueryString

    Converts each Row to a 'Lucene Query Syntax'

    queryPartColumns

    List of query columns for HashPartitioner

    entityPartColumns

    List of entity columns for HashPartitioner

    topK

    Number of linked results

    indexAnalyzer

    Lucene analyzer at index time

    queryAnalyzer

    Lucene analyzer at query time

    similarity

    Lucene Similarity metric (BM25, Tf/idf)

    returns

    Returns top-k linked results as RDD of Tuple2 where _1 is query and _2 is top-k linked results as SparkScoreDoc.

  17. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  20. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. def getAnalyzer(analyzerName: Option[String]): Analyzer

    Permalink
    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  22. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  23. def getOrElseClassic(): String

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  24. def getOrElseEn(analyzerName: Option[String]): String

    Permalink

    Get the configured analyzers or fallback to English

    Get the configured analyzers or fallback to English

    Attributes
    protected
    Definition Classes
    AnalyzerConfigurable
  25. def getSimilarity(similarityName: Option[String]): Similarity

    Permalink
    Attributes
    protected
    Definition Classes
    SimilarityConfigurable
  26. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  28. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  32. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  33. def version(): Map[String, Any]

    Permalink

    Return project information, i.e., version number, build time etc

    Return project information, i.e., version number, build time etc

    Definition Classes
    Versionable
  34. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SimilarityConfigurable

Inherited from AnalyzerConfigurable

Inherited from Configurable

Inherited from Serializable

Inherited from Serializable

Inherited from Versionable

Inherited from AnyRef

Inherited from Any

Ungrouped