带有排除字符串列表的最长子字符串

时间:2013-09-24 21:34:50

标签: java string algorithm

我使用this算法查找2个字符串之间的公共子字符串。请帮助我这样做但是使用Array这个字符串的常见子串,我应该在函数中忽略它。

我的Java代码:

public static String longestSubstring(String str1, String str2) {

        StringBuilder sb = new StringBuilder();
        if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
            return "";
        }

        // java initializes them already with 0
        int[][] num = new int[str1.length()][str2.length()];
        int maxlen = 0;
        int lastSubsBegin = 0;

        for (int i = 0; i < str1.length(); i++) {
            for (int j = 0; j < str2.length(); j++) {
                if (str1.charAt(i) == str2.charAt(j)) {
                    if ((i == 0) || (j == 0)) {
                        num[i][j] = 1;
                    } else {
                        num[i][j] = 1 + num[i - 1][j - 1];
                    }

                    if (num[i][j] > maxlen) {
                        maxlen = num[i][j];
                        // generate substring from str1 => i
                        int thisSubsBegin = i - num[i][j] + 1;
                        if (lastSubsBegin == thisSubsBegin) {
                            //if the current LCS is the same as the last time this block ran
                            sb.append(str1.charAt(i));
                        } else {
                            //this block resets the string builder if a different LCS is found
                            lastSubsBegin = thisSubsBegin;
                            sb = new StringBuilder();
                            sb.append(str1.substring(lastSubsBegin, i + 1));
                        }
                    }
                }
            }
        }

        return sb.toString();
    } 

所以,我的功能应该是:

public static String longestSubstring(String str1, String str2, String[] ignore)

2 个答案:

答案 0 :(得分:0)

据我了解,您必须忽略那些包含ignore中至少一个字符串的子字符串。

if (str1.charAt(i) == str2.charAt(j)) {
    if ((i == 0) || (j == 0)) {
        num[i][j] = 1;
    } else {
        num[i][j] = 1 + num[i - 1][j - 1];
    }


    // we must update `sb` on every step so that we can compare it with `ignore`
    int thisSubsBegin = i - num[i][j] + 1;
    if (lastSubsBegin == thisSubsBegin) {
        sb.append(str1.charAt(i));
    } else {
        lastSubsBegin = thisSubsBegin;
        sb = new StringBuilder();
        sb.append(str1.substring(lastSubsBegin, i + 1));
    }

    // check whether current substring contains any string from `ignore`,
    // and if it does, find the longest one
    int biggestIndex = -1; 
    for (String s : ignore) {
        int startIndex = sb.lastIndexOf(s);
        if (startIndex > biggestIndex) {
            biggestIndex = startIndex;    
        }
    }    

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1);
    num[i][j] -= (biggestIndex + 1);

    if (num[i][j] > maxlen) {
        maxlen = num[i][j];
    }
}

如果你必须忽略与<{1}}中的任何字符串完全的子串, 然后,当找到最长公共子串的候选者时,迭代ignore并检查其中是否存在当前子串。

答案 1 :(得分:0)

创建一个字符串的后缀树,然后遍历第二个树,看看哪个子字符串可以在后缀树中找到。

有关后缀树的信息:http://en.wikipedia.org/wiki/Suffixtree