io.github.jspinak.brobot.util.string.StringSimilarity

@Component public class StringSimilarity extends Object

Calculates string similarity using the Levenshtein distance algorithm.

This component provides methods to compute similarity scores between strings based on their edit distance. The similarity score ranges from 0.0 (completely different) to 1.0 (identical), making it useful for fuzzy string matching.

Algorithm details:

Uses Levenshtein distance (minimum edit operations needed)
Normalizes by the longer string's length for consistent scoring
Case-insensitive comparison in edit distance calculation
Optimized space complexity implementation
Important: Character transpositions (e.g., "ab" → "ba") count as TWO edits (deletion + insertion), not one. This differs from Damerau-Levenshtein distance.

Similarity formula:

 similarity = (longerLength - editDistance) / longerLength

Use cases:

OCR result validation and selection
Fuzzy text matching in UI automation
Detecting typos or variations in user input
Finding best matches in string collections
Duplicate detection with tolerance

Performance characteristics:

Time complexity: O(m × n) where m, n are string lengths
Space complexity: O(min(m, n)) - optimized implementation
Suitable for moderate string lengths

Based on: https://stackoverflow.com/questions/955110/similarity-string-comparison-in-java/16018452

Thread safety: All methods are stateless and thread-safe.

See Also:

Constructor Summary

Constructors

Constructor

Description

StringSimilarity()
Method Summary

Modifier and Type

Method

Description

static int

editDistance(String s1, String s2)

Calculates the Levenshtein edit distance between two strings.

static void

printSimilarity(String s, String t)

Prints a formatted similarity report for two strings.

static double

similarity(String s1, String s2)

Calculates the similarity score between two strings.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- StringSimilarity
  
  public StringSimilarity()
Method Details
- similarity
  
  public static double similarity(String s1, String s2)
  Calculates the similarity score between two strings.
  Returns a normalized score between 0.0 and 1.0, where:
  
  1.0 = Identical strings
  0.5 = Half the characters need changing
  0.0 = Completely different (edit distance equals longer length)
  
  Algorithm steps:
  
  Identify longer and shorter strings
  Calculate edit distance between them
  Normalize by longer string's length
  
  Examples:
  
  similarity("hello", "hello") = 1.0
  similarity("hello", "hallo") = 0.8
  similarity("hello", "help") = 0.6
  similarity("abc", "xyz") = 0.0
  
  Special cases:
  
  Both empty strings: Returns 1.0 (considered identical)
  One empty string: Returns 0.0
  Order independent: similarity(a,b) = similarity(b,a)
  Parameters:
  
  s1 - the first string to compare
  
  s2 - the second string to compare
  
  Returns:
  
  similarity score between 0.0 and 1.0 inclusive
- editDistance
  
  public static int editDistance(String s1, String s2)
  Calculates the Levenshtein edit distance between two strings.
  The edit distance is the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.
  Note on transpositions: This implementation uses standard Levenshtein distance, which counts character transpositions (swapping adjacent characters) as TWO edits. For example, "ab" → "ba" has an edit distance of 2 (delete 'b', insert 'b'), not 1. Use Damerau-Levenshtein distance if you need transpositions to count as a single edit.
  Implementation details:
  
  Space-optimized dynamic programming approach
  Uses single array instead of full matrix
  Case-insensitive comparison (converts to lowercase)
  Processes strings character by character
  
  Algorithm visualization:
  s1 = "cat", s2 = "cut" Edit operations: substitute 'a' with 'u' Edit distance = 1
  
  Examples:
  
  editDistance("kitten", "sitting") = 3
  editDistance("saturday", "sunday") = 3
  editDistance("abc", "abc") = 0
  editDistance("abc", "") = 3
  
  Performance notes:
  
  Time: O(m × n) where m = s1.length(), n = s2.length()
  Space: O(n) - only stores one row of the DP matrix
  Lowercase conversion adds overhead but ensures consistency
  
  Based on: http://rosettacode.org/wiki/Levenshtein_distance#Java
  Parameters:
  
  s1 - the source string
  
  s2 - the target string
  
  Returns:
  
  the minimum number of edits needed to transform s1 into s2
- printSimilarity
  
  public static void printSimilarity(String s, String t)
  Prints a formatted similarity report for two strings.
  Outputs the similarity score with 3 decimal places along with the compared strings in quotes for clarity. Useful for debugging and analysis of string matching results.
  Output format:
  0.857 is the similarity between "hello" and "hallo"
  
  Use cases:
  
  Debugging OCR results
  Analyzing text matching thresholds
  Logging similarity calculations
  Testing string comparison algorithms
  Parameters:
  
  s - the first string to compare
  
  t - the second string to compare

Class StringSimilarity

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

StringSimilarity

Method Details

similarity

editDistance

printSimilarity