检查替换字符串是否有效

时间:2016-08-22 22:27:49

标签: java regex replace

这是检查replacement string是否有效的实用方法:

public static boolean isValidReplacementString(String regex, String replacement) {
    try {
        "".replaceFirst(regex, replacement);
        return true;
    } catch (IllegalArgumentException | NullPointerException e) {
        return false;
    }
}

我想在执行真正的替换之前检查一下,因为获取源字符串很昂贵(I / O)。

我发现这个解决方案非常hacky。标准库中是否已存在我遗漏的方法?

修改 As pointed out by sln,如果找不到匹配项,这甚至都不起作用。

修改 Following shmosel's answer,我提出了这个“解决方案”:

private static boolean isLower(char c) {
    return c >= 'a' && c <= 'z';
}

private static boolean isUpper(char c) {
    return c >= 'A' && c <= 'Z';
}

private static boolean isDigit(char c) {
    return isDigit(c - '0');
}

private static boolean isDigit(int c) {
    return c >= 0 && c <= 9;
}

@SuppressWarnings("unchecked")
public static void checkRegexAndReplacement(String regex, String replacement)  {
    Pattern parentPattern = Pattern.compile(regex);
    Map<String, Integer> namedGroups;
    int capturingGroupCount;

    try {
        Field namedGroupsField = Pattern.class.getDeclaredField("namedGroups");
        namedGroupsField.setAccessible(true);
        namedGroups = (Map<String, Integer>) namedGroupsField.get(parentPattern);
        Field capturingGroupCountField = Pattern.class.getDeclaredField("capturingGroupCount");
        capturingGroupCountField.setAccessible(true);
        capturingGroupCount = capturingGroupCountField.getInt(parentPattern);
    } catch (NoSuchFieldException | IllegalAccessException e) {
        throw new RuntimeException("That's what you get for using reflection!", e);
    }

    int groupCount = capturingGroupCount - 1;

    // Process substitution string to replace group references with groups
    int cursor = 0;

    while (cursor < replacement.length()) {
        char nextChar = replacement.charAt(cursor);
        if (nextChar == '\\') {
            cursor++;
            if (cursor == replacement.length())
                throw new IllegalArgumentException(
                        "character to be escaped is missing");
            nextChar = replacement.charAt(cursor);
            cursor++;
        } else if (nextChar == '$') {
            // Skip past $
            cursor++;
            // Throw IAE if this "$" is the last character in replacement
            if (cursor == replacement.length())
                throw new IllegalArgumentException(
                        "Illegal group reference: group index is missing");
            nextChar = replacement.charAt(cursor);
            int refNum = -1;
            if (nextChar == '{') {
                cursor++;
                StringBuilder gsb = new StringBuilder();
                while (cursor < replacement.length()) {
                    nextChar = replacement.charAt(cursor);
                    if (isLower(nextChar) ||
                            isUpper(nextChar) ||
                            isDigit(nextChar)) {
                        gsb.append(nextChar);
                        cursor++;
                    } else {
                        break;
                    }
                }
                if (gsb.length() == 0)
                    throw new IllegalArgumentException(
                            "named capturing group has 0 length name");
                if (nextChar != '}')
                    throw new IllegalArgumentException(
                            "named capturing group is missing trailing '}'");
                String gname = gsb.toString();
                if (isDigit(gname.charAt(0)))
                    throw new IllegalArgumentException(
                            "capturing group name {" + gname +
                                    "} starts with digit character");
                if (namedGroups == null || !namedGroups.containsKey(gname))
                    throw new IllegalArgumentException(
                            "No group with name {" + gname + "}");
                refNum = namedGroups.get(gname);
                cursor++;
            } else {
                // The first number is always a group
                refNum = (int)nextChar - '0';
                if (!isDigit(refNum))
                    throw new IllegalArgumentException(
                            "Illegal group reference");
                cursor++;
                // Capture the largest legal group string
                boolean done = false;
                while (!done) {
                    if (cursor >= replacement.length()) {
                        break;
                    }
                    int nextDigit = replacement.charAt(cursor) - '0';
                    if (!isDigit(nextDigit)) { // not a number
                        break;
                    }
                    int newRefNum = (refNum * 10) + nextDigit;
                    if (groupCount < newRefNum) {
                        done = true;
                    } else {
                        refNum = newRefNum;
                        cursor++;
                    }
                }
            }
            if (refNum < 0 || refNum > groupCount) {
                throw new IndexOutOfBoundsException("No group " + refNum);
            }
        } else {
            cursor++;
        }
    }
}

如果抛出此方法,则正则表达式或替换字符串无效。

这比replaceAllreplaceFirst更严格,因为如果找不到匹配项,这些方法将不会调用appendReplacement,因此“缺少”无效的组引用。

1 个答案:

答案 0 :(得分:1)

我说你最好的办法是复制Matcher.appendReplacement()中实现的流程,删除与源字符串或结果字符串相关的任何逻辑。这不可避免地意味着您无法进行某些验证,例如验证组名和索引,但您应该能够应用其中的大部分。