Class JaroWinklerDistance
- java.lang.Object
-
- org.apache.commons.text.similarity.JaroWinklerDistance
-
- All Implemented Interfaces:
SimilarityScore<Double>
public class JaroWinklerDistance extends Object implements SimilarityScore<Double>
A similarity algorithm indicating the percentage of matched characters between two character sequences.The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.
This implementation is based on the Jaro Winkler similarity algorithm from http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.
This code has been adapted from Apache Commons Lang 3.3.
- Since:
- 1.0
-
-
Field Summary
Fields Modifier and Type Field Description static int
INDEX_NOT_FOUND
Represents a failed index search.
-
Constructor Summary
Constructors Constructor Description JaroWinklerDistance()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Double
apply(CharSequence left, CharSequence right)
Find the Jaro Winkler Distance which indicates the similarity score between two CharSequences.protected static int[]
matches(CharSequence first, CharSequence second)
This method returns the Jaro-Winkler string matches, half transpositions, prefix array.
-
-
-
Field Detail
-
INDEX_NOT_FOUND
public static final int INDEX_NOT_FOUND
Represents a failed index search.- See Also:
- Constant Field Values
-
-
Method Detail
-
apply
public Double apply(CharSequence left, CharSequence right)
Find the Jaro Winkler Distance which indicates the similarity score between two CharSequences.distance.apply(null, null) = IllegalArgumentException distance.apply("","") = 0.0 distance.apply("","a") = 0.0 distance.apply("aaapppp", "") = 0.0 distance.apply("frog", "fog") = 0.93 distance.apply("fly", "ant") = 0.0 distance.apply("elephant", "hippo") = 0.44 distance.apply("hippo", "elephant") = 0.44 distance.apply("hippo", "zzzzzzzz") = 0.0 distance.apply("hello", "hallo") = 0.88 distance.apply("ABC Corporation", "ABC Corp") = 0.93 distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88
- Specified by:
apply
in interfaceSimilarityScore<Double>
- Parameters:
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be null- Returns:
- result distance
- Throws:
IllegalArgumentException
- if either CharSequence input isnull
-
matches
protected static int[] matches(CharSequence first, CharSequence second)
This method returns the Jaro-Winkler string matches, half transpositions, prefix array.- Parameters:
first
- the first string to be matchedsecond
- the second string to be matched- Returns:
- mtp array containing: matches, half transpositions, and prefix
-
-