Wednesday, September 29, 2010

Levenshtein distance

Needed to determine the distance between 2 strings.

Here is a basic implementation of the “Levenshtein distance” algorithm, which measures the distance as the number of edits required to change one string to the other via a series of insertion, deletion and substitution operations.

The following implementation is based of the pseudo-code on wikipedia. (http://en.wikipedia.org/wiki/Levenshtein_distance)

public class StringEditDistance 
{
//http://en.wikipedia.org/wiki/Levenshtein_distance
public int LevenshteinDistance (string s, string t)
{
int m = s.Length; //length of s
int n = t.Length; //length of t

if(n == 0) return m;
if(m == 0) return n;

int[,] d = new int[m+1, n+1]; // d is a table with m+1 rows and n+1 columns
for(int i = 0; i <= m; i++)
d[i, 0] = i; //deletion
for(int j = 0; j <= n; j++)
d[0, j] = j; //insertion

for(int j = 1; j <= n; j++)
{
for(int i = 1; i <= m; i++)
{
if (s.Substring(i-1, 1) == t.Substring(j-1, 1))
{
d[i,j] = d[i-1,j-1];
}
else
{
int deletion = d[i - 1, j] + 1;
int insertion = d[i, j - 1] + 1;
int substitution = d[i - 1, j - 1] + 1;
d[i, j] = Math.Min(Math.Min(deletion, insertion),substitution);
}
}
}
return d[m, n];
}
}

No comments: