如何从输入文本中删除特殊字符

时间:2013-11-26 12:34:51

标签: java pattern-matching

我想删除输入文字中的所有特殊字符以及一些受限制的字词。

无论我想删除哪些内容,都会动态出现

(让我澄清一下:无论我需要排除的是什么,它们都将动态提供 - 用户将决定需要排除的内容。这就是我没有包含正则表达式的原因.entrict_words_list(请参阅我的代码)将获得从数据库中只是为了检查代码是否正常工作,我保持静态),

但出于演示目的,我将它们保存在String数组中以确认我的代码是否正常工作。

public class TestKeyword {

    private static final String[] restricted_words_list={"@","of","an","^","#","<",">","(",")"};

    private static final Pattern restrictedReplacer;

    private static Set<String> restrictedWords = null;

    static {

        StringBuilder strb= new StringBuilder();

        for(String str:restricted_words_list){
            strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
        }

        strb.setLength(strb.length()-1);
        restrictedReplacer = Pattern.compile(strb.toString(),Pattern.CASE_INSENSITIVE);

        strb = new StringBuilder();    
    }

    public static void main(String[] args)
    {
        String inputText = "abcd abc@ cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";
        System.out.println("inputText : " + inputText);
        String modifiedText = restrictedWordCheck(inputText);
        System.out.println("Modified Text : " + modifiedText);

    }

    public static String restrictedWordCheck(String input){
        Matcher m = restrictedReplacer.matcher(input);
        StringBuffer strb = new StringBuffer(input.length());//ensuring capacity

        while(m.find()){
            if(restrictedWords==null)restrictedWords = new HashSet<String>();
            restrictedWords.add(m.group());  //m.group() returns what was matched
            m.appendReplacement(strb,""); //this writes out what came in between matching words

            for(int i=m.start();i<m.end();i++)
                strb.append("");
        }
        m.appendTail(strb);
        return strb.toString();
    }
}

输出结果为:

inputText:abcd abc @ cbda ssef of jjj t#he g ^ g an wh&amp; at ggg

修改后的文字:abcd abc @ cbda ssef jjj gg wh&amp; at gggg ss%ss ###(())DhD

此处排除的字词 ,但只有部分特殊字符,而不是我在restricted_words_list中指定的所有字符


现在我有了更好的解决方案:

    String inputText = title;// assigning input 
    List<String> restricted_words_list = catalogueService.getWordStopper(); // getting all stopper words from database dynamically (inside getWordStopper() method just i wrote a query and getting list of words)
    String finalResult = "";
    List<String> stopperCleanText = new ArrayList<String>();

    String[] afterTextSplit = inputText.split("\\s"); // split and add to list

    for (int i = 0; i < afterTextSplit.length; i++) {
        stopperCleanText.add(afterTextSplit[i]); // adding to list
    }

    stopperCleanText.removeAll(restricted_words_list); // remove all word stopper 

    for (String addToString : stopperCleanText)
    {
        finalResult += addToString+";"; // add semicolon to cleaned text 
    }

    return finalResult;

5 个答案:

答案 0 :(得分:1)

public String replaceAll(String regex,
                         String replacement)

将此字符串的每个子字符串(与给定的正则表达式匹配)替换为给定的替换。

参数:

  • regex - 此字符串所在的正则表达式 匹配
  • replacement - 要替换​​每场比赛的字符串。

所以你只需要用空字符串提供替换参数。

答案 1 :(得分:0)

您可以考虑直接使用Regex将这些特殊字符替换为空“?”?请查看:Java; String replace (using regular expressions)?,此处有一些教程:http://www.vogella.com/articles/JavaRegularExpressions/article.html

答案 2 :(得分:0)

你也可以这样做:

    String inputText = "abcd abc@ cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";        
    String regx="([^a-z^ ^0-9]*\\^*)";        
    String textWithoutSpecialChar=inputText.replaceAll(regx,"");
    System.out.println("Without Special Char:"+textWithoutSpecialChar);

    String yourSetofString="of|an";   // your restricted words.      
    String op=textWithoutSpecialChar.replaceAll(yourSetofString,"");
    System.out.println("output : "+op);

o / p:

Without Special Char:abcd abc cbda ssef of jjj the gg an what gggg ssss   h

output : abcd abc cbda ssef  jjj the gg  what gggg ssss   h

答案 3 :(得分:0)

String s = "abcd abc@ cbda ssef of jjj t#he g^g an wh&at ggg (blah) and | then";

String[] words = new String[]{ " of ", "|", "(", " an ", "#", "@", "&", "^", ")" };
StringBuilder sb = new StringBuilder();
for( String w : words ) {
    if( w.length() == 1 ) {
        sb.append( "\\" );
    }
    sb.append( w ).append( "|" );
}
System.out.println( s.replaceAll( sb.toString(), "" ) );

答案 4 :(得分:0)

你应该改变你的循环

for(String str:restricted_words_list){
        strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
}

到此:

for(String str:restricted_words_list){
        strb.append("\\b*").append(Pattern.quote(str)).append("\\b*|");
}

因为只有在匹配后之前有某些内容时,您的循环才匹配restricted_words_list元素。由于abc@@之后没有任何内容,因此不会被替换。如果您向*添加\\b(意味着0或更多次出现),它也会匹配abc@之类的内容。