从字符串中有效删除字符

时间:2015-03-07 00:51:47

标签: java string

这可能听起来像一个非常简单的问题,但是如何从字符串中删除多个不同的字符而不必为每个字符写一行,这是我费力的做法。我在下面写了一个字符串示例:

            String word = "Hello, t-his is; an- (example) line."

            word = word.replace(",", "");
            word = word.replace(".", "");
            word = word.replace(";", "");
            word = word.replace("-", "");
            word = word.replace("(", "");
            word = word.replace(")", "");
            System.out.println(word);

哪会产生" Hello this is an example line"。一种更有效的方法是?

5 个答案:

答案 0 :(得分:4)

使用

word = word.replaceAll("[,.;\\-()]", "");

请注意,特殊字符-(连字符)应该由双反斜杠转义,否则会被视为构造范围。

答案 1 :(得分:2)

虽然效率不如原始replace技术,但您可以使用

word = word.replaceAll("\\p{Punct}+", "");

使用replaceAll的简单表达式替换更广泛的字符

答案 2 :(得分:1)

如果没有(ab)使用正则表达式,我会这样做:

String word = "Hello, t-his is; an- (example) line.";
String undesirable = ",.;-()";

int len1 = undesirable.length();
int len2 = word.length();

StringBuilder sb = new StringBuilder(len2);
outer: for (int j = 0; j < len2; j++) {
    char c = word.charAt(j);
    for (int i = 0; i < len; i++) {
        if (c == undesirable.charAt(i)) continue outer;
    }
    sb.append(c);
}
System.out.println(sb.toString());

优点是性能。您不需要创建和解析正则表达式的开销。

您可以将其封装在方法中:

public static String removeCharacters(String word, String undesirable) {
    int len1 = undesirable.length();
    int len2 = word.length();

    StringBuilder sb = new StringBuilder(len2);
    outer: for (int j = 0; j < len2; j++) {
        char c = word.charAt(j);
        for (int i = 0; i < len1; i++) {
            if (c == undesirable.charAt(i)) continue outer;
        }
        sb.append(c);
    }
    return sb.toString();
}

public static String removeSpecialCharacters(String word) {
    return removeCharacters(word, ",.;-()");
}

然后,你会这样使用它:

public static void testMethod() {
    String word = "Hello, t-his is; an- (example) line.";
    System.out.println(removeSpecialCharacters(word));
}

以下是性能测试:

public class WordTest {
    public static void main(String[] args) {
        int iterations = 10000000;
        long t1 = System.currentTimeMillis();
        for (int i = 0; i < iterations; i++) {
            testAsArray();
        }
        long t2 = System.currentTimeMillis();
        for (int i = 0; i < iterations; i++) {
            testRegex();
        }
        long t3 = System.currentTimeMillis();
        for (int i = 0; i < iterations; i++) {
            testAsString();
        }
        long t4 = System.currentTimeMillis();
        System.out.println("Without regex, but using copied arrays: " + (t2 - t1));
        System.out.println("With precompiled regex: " + (t3 - t2));
        System.out.println("Without regex, but using string: " + (t4 - t3));
    }

    public static void testAsArray() {
        String word = "Hello, t-his is; an- (example) line.";

        char[] undesirable = ",.;-()".toCharArray();
        StringBuilder sb = new StringBuilder(word.length());
        outer: for (char c : word.toCharArray()) {
            for (char h : undesirable) {
                if (c == h) continue outer;
            }
            sb.append(c);
        }
        sb.toString();
    }

    public static void testAsString() {
        String word = "Hello, t-his is; an- (example) line.";

        String undesirable = ",.;-()";
        int len1 = undesirable.length();
        int len2 = word.length();
        StringBuilder sb = new StringBuilder(len2);
        outer: for (int j = 0; j < len2; j++) {
            char c = word.charAt(j);
            for (int i = 0; i < len1; i++) {
                if (c == undesirable.charAt(i)) continue outer;
            }
            sb.append(c);
        }
        sb.toString();
    }

    private static final Pattern regex = Pattern.compile("[,\\.;\\-\\(\\)]");

    public static void testRegex() {
        String word = "Hello, t-his is; an- (example) line.";
        String result = regex.matcher(word).replaceAll("");
    }
}

我机器上的输出:

Without regex, but using copied arrays: 5880
With precompiled regex: 11011
Without regex, but using string: 3844

答案 3 :(得分:0)

您可以尝试使用Java的String.replaceAll方法使用正则表达式:

word = word.replaceAll(",|\.|;|-|\(|\)", "");

如果您不熟悉正则表达式,|意思是“或”。所以我们基本上是说,或者。要么 ;或 - 或(或)。

查看更多:Java documentation for String.replaceAll

修改

如上所述,我以前的版本不会编译。只是为了正确起见(尽管已经指出这不是最佳解决方案),这是我的正则表达式的更正版本:

word = word.replaceAll(",|\\.|;|-|\\(|\\)", "");

答案 4 :(得分:0)

这是一个以最小的努力完成这项工作的解决方案; toRemove字符串包含您不希望在输出中看到的所有字符:

public static String removeChars(final String input, final String toRemove)
{
    final StringBuilder sb = new StringBuilder(input.length());
    final CharBuffer buf = CharBuffer.wrap(input);

    char c;

    while (buf.hasRemaining()) {
        c = buf.get();
        if (toRemove.indexOf(c) == -1)
            sb.append(c);
    }

    return sb.toString();
}

如果你使用Java 8,你甚至可以使用它(不幸的是那里没有CharStream所以必须使用强制转换......):

public static String removeChars(final String input, final String toRemove)
{
    final StringBuilder sb = new StringBuilder(input.length());

    input.chars().filter(c -> toRemove.indexOf((char) c) == -1)
        .forEach(i -> sb.append((char) i));

    return sb.toString();
}