编辑距离Java

时间:2018-04-18 13:09:56

标签: java sorting methods

我编写了这个算法来计算删除和插入次数的总和(因此,编辑),使第一个字符串等于第二个字符串。但它没有用。

public static int distance (String s1, String s2) {
    return distance(s1, s2, 0, 0);
}

private static int distance(String s1, String s2, int i, int j) {
    if (i == s1.length) return j;
    if (j == s2.length) return i;
    if (s1.charAt(i) == s2.charAt(j))
        return distance(s1, s2, i + 1, j + 1);
    int rep = distance(s1, s2, i + 1, j + 1) + 1;
    int del = distance(s1, s2, i, j + 1) + 1;
    int ins = distance(s1, s2, i + 1, j) + 1;
    return Math.min(del, Math.min(ins, rep));
}

编辑:示例 字符串1:" casa" 字符串2:" cara" edit_distance = 2(1个删除+ 1个插入)

EDIT2: 这些是有效的字符串: 字符串1:" casa",字符串2:" cassa",edit_distance = 1; 字符串1:" pioppo",字符串2:" pioppo",edit_distance = 0;

这些是不起作用的: 字符串1:" casa",字符串2:" cara",edit_distance = 2; (在我的代码中= 0) 字符串1:" tassa",字符串2:" passato",edit_distance = 4; (在我的代码中= 2)

5 个答案:

答案 0 :(得分:1)

我认为实施几乎是正确的,你错过了停止条件。他们应该是:

if (j == s2.length()) {
    return s1.length() - i;
}
if (i == s1.length()) {
    return s2.length() - j;
}

所以完整的实现应该是:

private static int distance(String s1, String s2, int i, int j) {
    if (j == s2.length()) {
        return s1.length() - i;
    }
    if (i == s1.length()) {
        return s2.length() - j;
    }
    if (s1.charAt(i) == s2.charAt(j))
        return distance(s1, s2, i + 1, j + 1);
    int rep = distance(s1, s2, i + 1, j + 1) + 2; // since Jim Belushi considers replacement to be worth 2.
    int del = distance(s1, s2, i, j + 1) + 1;
    int ins = distance(s1, s2, i + 1, j) + 1;
    return Math.min(del, Math.min(ins, rep));
}

<强>更新

以下是&#34; tassa&#34;的结果。和&#34; passato&#34;:

代码:

private static int distance(String s1, String s2, int i, int j) {
    if (j == s2.length()) {
        return s1.length() - i;
    }
    if (i == s1.length()) {
        return s2.length() - j;
    }
    if (s1.charAt(i) == s2.charAt(j))
        return distance(s1, s2, i + 1, j + 1);
    int rep = distance(s1, s2, i + 1, j + 1) + 2;
    int del = distance(s1, s2, i, j + 1) + 1;
    int ins = distance(s1, s2, i + 1, j) + 1;
    return Math.min(del, Math.min(ins, rep));
}

public static void main(String[] args) {
    int dist = distance("tassa", "passato", 0, 0);
    System.out.println(dist);
}

如果你运行这个,你得到:

4

答案 1 :(得分:0)

这应该是你想要的

如果每次编辑char都意味着距离+ 2(=删除+添加),它还会添加/删除的字符数 - 但只有+1,而不是+2

//get number of deletions / edits - inc 1 per each
public static void editDistance() {
    String s1 = "casa";
    String s2 = "cara";

    String longer;
    String shorter;
    if(s1.length() > s2.length()) {
        longer = s1;
        shorter = s2;
    }else {
        shorter = s1;
        longer = s2;
    }

    int edits = 0;
    for (int i = 0; i < shorter.length(); i++) {
        if(shorter.charAt(i) != longer.charAt(i)) {
            edits++;
        }
    }

    edits = edits *2; //one delete, one insert you told
    edits = edits + Math.abs(s1.length() - s2.length()); //if different length then add counts of added/removed chars 

    System.out.println("edit count: " + edits);

}

答案 2 :(得分:0)

你需要指定当你到达一个字符串而不是另一个字符串时如何继续,试试这个

public static void main(String[] args) {
    System.out.println(distance("casa","cassa"));
}

public static int distance (String s1, String s2) {
    return distance(s1, s2, 0, 0);
}

private static int distance(String s1, String s2, int i, int j) {
    if (i == s1.length() && j==s2.length())
        return 0;
    else if(i== s1.length())
        return s2.length() - j;
    else if(j == s2.length())
        return s1.length() - i;

    if (s1.charAt(i) == s2.charAt(j))
        return distance(s1, s2, i + 1, j + 1);

    int rep = distance(s1, s2, i + 1, j + 1) + 1;
    int del = distance(s1, s2, i, j + 1) + 1;
    int ins = distance(s1, s2, i + 1, j) + 1;
    return Math.min(del, Math.min(ins, rep));
}

输出

1

注意:第一个if不是必需的,只需让代码更容易理解......在你的impl中删除它

答案 3 :(得分:0)

两个简单的更改和您的代码有效:

首先:

    if (i == s1.length()) return s2.length() - j;
    if (j == s2.length()) return s1.length() - i;

而不是

    if (i == s1.length()) return j;
    if (j == s2.length()) return i;

下一步:

    int rep = distance(s1, s2, i + 1, j + 1) + 2;

最后的2在这里很重要。如果rep表示替换,则为删除AND插入。做两个操作,而不是1。

答案 4 :(得分:0)

它适用于我:

private static int distance(String s1, String s2, int i, int j) {
    if (i == s1.length() && j == s2.length()) {
        return 0;
    } else if (i == s1.length()) {
        return s2.length() - j;
    } else if (j == s2.length()) {
        return s1.length() - i;
    }

    if (s1.charAt(i) == s2.charAt(j)) {
        return distance(s1, s2, i + 1, j + 1);
    }

    // int rep = distance(s1, s2, i + 1, j + 1) + 1;
    int del = distance(s1, s2, i, j + 1) + 1;
    int ins = distance(s1, s2, i + 1, j) + 1;
    //  return Math.min(del, Math.min(ins, rep));
    return Math.min(del, ins);
}

有测试,它也有效:

/**
 * Test of distanceRec method, of class EditDistance.
 */
@Test
public void testDistanceRec() {
    System.out.println("distanceRec");
    String s1 = "passato";
    String s2 = "tassa";
    int expResult = 4;
    int result = EditDistance.distanceRec(s1, s2);
    assertEquals(expResult, result);
    // Review the generated test code and remove the default call to fail.
    //fail("The test case is a prototype.");
}

在这个应用程序中,你只能使用两个操作:插入和删除,没有其他操作,如替换或匹配。 运动文本:

  

假设可用的操作只有两个:删除和插入一个字符。例子:    - “casa”和“cassa”的编辑距离等于1(1取消);    - “casa”和“cara”的编辑距离等于2(1个取消+ 1个插入);    - “tax”和“past”的编辑距离等于4(3次取消+ 1次插入);    - “poplar”和“poplar”的编辑距离为0。