Jaro distance
The Jaro distance is a measure of similarity between two strings. The higher the
Jaro distance for two strings is, the more similar the strings are. The score is
normalized such that 0
equates to no similarity and 1
is an exact match.
Definition
The Jaro distance \( d_j \) of two given strings \(s_1\) and \(s_2\) is
\begin{align}d_j = \begin{cases}0& & \text{if }m=0 \\\\{\frac {1}{3}}\left({\frac {m}{|s_{1}|}}+{\frac {m}{|s_{2}|}}+{\frac {m-t}{m}}\right)& & \text{otherwise}\end{cases}\end{align}
Where:
- \(m\) is the number of matching characters;
- \(t\) is half the number of transpositions.
Two characters from \(s_1\) and \(s_2\) respectively, are considered matching only if they are the same and not farther than \(\left\lfloor\frac{\max(|s_1|,|s_2|)}{2}\right\rfloor-1\).
Each character of \(s_1\) is compared with all its matching characters in \(s_2\) . The number of matching (but different sequence order) characters divided by 2 defines the number of transpositions.
Example
Given the strings \(s_1\) DWAYNE and \(s_2\) DUANE we find:
- \(m = 4\)
- \(|s_1| = 6\)
- \(|s_2| = 5\)
- \(t = 0\)
We find a Jaro score of: \(d_j = \frac{1}{3}\left(\frac{4}{6} + \frac{4}{5} + \frac{4-0}{4}\right) = 0.822\).
Test
{{test}}Console output