如何在Java中消除String中的重复单词?

时间:2017-03-13 18:27:20

标签: java string arraylist

我有ArrayListString,它包含以下记录:

this is a first sentence
hello my name is Chris 
what's up man what's up man
today is tuesday

我需要清除此列表,以便输出不包含重复内容。在上面的例子中,输出应该是:

this is a first sentence
hello my name is Chris 
what's up man
today is tuesday

如您所见,第3个字符串已被修改,现在只包含一个语句what's up man而不是其中两个。 在我的列表中,有时候字符串是正确的,有时它会加倍,如上所示。

我想摆脱它,所以我想到迭代这个列表:

for (String s: myList) {

但我找不到消除重复的方法,特别是因为每个字符串的长度没有确定,所以我的意思是可能有记录:

this is a very long sentence this is a very long sentence

或有时短的:

single word singe word

是否有一些原生java函数呢?

6 个答案:

答案 0 :(得分:2)

假设字符串重复两次,并且在示例中间有空格,以下代码将删除重复:

for (int i=0; i<myList.size(); i++) {
    String s = myList.get(i);
    String fs = s.substring(0, s.length()/2);
    String ls = s.substring(s.length()/2+1, s.length());
    if (fs.equals(ls)) {
        myList.set(i, fs);
    }
}

代码只是将列表的每个条目分成两个子串(除以半点)。如果两者相等,则仅用一半替换原始元素,从而消除重复。

我正在测试代码并没有看到@Brendan Robert回答。该代码遵循与其答案相同的逻辑。

答案 1 :(得分:2)

我建议使用正则表达式。我能够使用这种模式删除重复项:\b([\w\s']+) \1\b

public class Main {
    static String [] phrases = {
            "this is a first sentence",
            "hello my name is Chris",
            "what's up man what's up man",
            "today is tuesday",
            "this is a very long sentence this is a very long sentence",
            "single word single word",
            "hey hey"
    };
    public static void main(String[] args) throws Exception {
        String duplicatePattern = "\\b([\\w\\s']+) \\1\\b";
        Pattern p = Pattern.compile(duplicatePattern);
        for (String phrase : phrases) {
            Matcher m = p.matcher(phrase);
            if (m.matches()) {
                System.out.println(m.group(1));
            } else {
                System.out.println(phrase);
            }
        }
    }
}

结果:

this is a first sentence
hello my name is Chris
what's up man
today is tuesday
this is a very long sentence
single word
hey

答案 2 :(得分:1)

假设:

  1. 大写单词等于小写单词。
  2. String fullString = "lol lol";
    String[] words = fullString.split("\\W+");
    StringBuilder stringBuilder = new StringBuilder();
    Set<String> wordsHashSet = new HashSet<>();
    
    for (String word : words) {
        // Check for duplicates
        if (wordsHashSet.contains(word.toLowerCase())) continue;
    
        wordsHashSet.add(word.toLowerCase());
        stringBuilder.append(word).append(" ");
    }
    String nonDuplicateString = stringBuilder.toString().trim();
    

答案 3 :(得分:1)

简单的逻辑:用标记空间分割每个单词,即&#34; &#34;现在将它添加到LinkedHashSet中,取回,替换&#34; [&#34;,&#34;]&#34;,&#34;,&#34;

 String s = "I want to walk my dog I want to walk my dog";
 Set<String> temp = new LinkedHashSet<>();
 String[] arr = s.split(" ");

 for ( String ss : arr)
      temp.add(ss);

 String newl = temp.toString()
          .replace("[","")
          .replace("]","")
          .replace(",","");

 System.out.println(newl);

o / p:我想遛狗

答案 4 :(得分:0)

这取决于您所拥有的情况,但假设该字符串最多可重复两次,而不是三次或更多次,您可以找到整个字符串的长度,找到中间点并比较中途点后的每个索引与匹配的开始索引。如果字符串可以重复多次,则需要一个更复杂的算法,该算法首先确定字符串重复的次数,然后找到每个重复的起始索引,并从第一个开头截断所有索引。重复前进。如果您可以为您希望处理的可能场景提供更多上下文,我们可以开始汇总一些想法。

答案 5 :(得分:0)

//在Java 8中完成

String str1 = "I am am am a good Good coder";
        String[] arrStr = str1.split(" ");
        String[] element = new String[1];
        return Arrays.stream(arrStr).filter(str1 -> {
            if (!str1.equalsIgnoreCase(element[0])) {
                element[0] = str1;
               return true;
            }return false;
        }).collect(Collectors.joining(" "));