Object

com.nexthink.utils.parsing.distance

NgramDistance

Related Doc: package distance

Permalink

object NgramDistance extends EditDistance[String]

N-gram edit com.nexthink.utils.parsing.distance is an edit com.nexthink.utils.parsing.distance metric which considers multiple characters at a time. N-gram edit com.nexthink.utils.parsing.distance takes the idea of Levenshtein com.nexthink.utils.parsing.distance and treats each n-gram as a character. The impact of this approach is that insertions and deletions which don't involve double letters are more heavily penalized using n-grams than unigrams. In essence, it introduces a notion of context and favors strings with continuous streches of equal characters (since it multiples the number of comparisons). It is generally used with bigrams, which offer the best efficiency/performance ratio. We also refine this approach with some level of partial credit for n-grams that share common characters. In addition, by using string affixing which allow the first character to participate in the same number of n-grams as an intermediate character. Also, words that don't begin with the same n-1 characters receive a penalty for not matching the prefix.

See http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf (N-Gram Similarity and Distance, Grzegorz Kondrak, 2005) This approach is described in "Taming Text", chapter 4 "Fuzzy string matching", https://www.manning.com/books/taming-text

Linear Supertypes
EditDistance[String], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. NgramDistance
  2. EditDistance
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class EditWeights(insertion: WeightComputation, deletion: WeightComputation, substitution: WeightComputation, maxPossibleWeight: Int) extends Product with Serializable

    Permalink
    Definition Classes
    EditDistance
  2. type WeightComputation = (String, String) ⇒ Int

    Permalink
    Definition Classes
    EditDistance

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def constantWeightOfArity(arity: Int)(a: String, b: String): Int

    Permalink
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. def ngramWeight(a: String, b: String): Int

    Permalink
  15. def normalizedBigramDistance(a: String, b: String): Double

    Permalink
  16. def normalizedBigramSimilarity(a: String, b: String): Double

    Permalink
  17. def normalizedEditDistance(a: Seq[String], b: Seq[String], weights: EditWeights): Double

    Permalink
    Attributes
    protected
    Definition Classes
    EditDistance
  18. def normalizedNgramDistance(a: String, b: String, arity: Int): Double

    Permalink
  19. def normalizedNgramSimilarity(a: String, b: String, arity: Int): Double

    Permalink
  20. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  23. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from EditDistance[String]

Inherited from AnyRef

Inherited from Any

Ungrouped