字符串列表中的单词频率

时间:2014-08-22 13:24:49

标签: java string list arraylist

我有一个字符串列表:

List<String> terms = ["Coding is great", "Search Engines are great", "Google is a nice search engine"]

如何获取列表中每个单词的频率: E.g。{Coding:1, Search:2, Engines:1, engine:1, ....}

这是我的代码:

    Map<String, Integer> wordFreqMap = new HashMap<>(); 
    for (String contextTerm : term.getContexTerms()  ) 
                {
                    String[] wordsArr = contextTerm.split(" ");
                    for (String  word : wordsArr) 
                    {
                        Integer freq = wordFreqMap.get(word); //this line is getting reset every time I goto a new COntexTerm
                        freq = (freq == null) ? 1: ++freq;
                        wordFreqMap.put(word, freq);
                    }
                }

3 个答案:

答案 0 :(得分:10)

使用Java 8流的惯用解决方案:

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class SplitWordCount
{
    public static void main(String[] args)
    {
        List<String> terms = Arrays.asList(
            "Coding is great",
            "Search Engines are great",
            "Google is a nice search engine");

        Map<String, Integer> result = terms.parallelStream().
            flatMap(s -> Arrays.asList(s.split(" ")).stream()).
            collect(Collectors.toConcurrentMap(
                w -> w.toLowerCase(), w -> 1, Integer::sum));
        System.out.println(result);
    }
}

请注意,您可能需要考虑字符串的大写/小写是否应该起作用。这个将字符串转换为小写字母,并将它们用作最终映射的键。结果是:

{coding=1, a=1, search=2, are=1, engine=1, engines=1, 
     is=2, google=1, great=2, nice=1}

答案 1 :(得分:1)

public static void main(String[] args) {
    String msg="Coding is great search Engines are great Google is a nice search engine";                   
    ArrayList<String> list2 = new ArrayList<>();
    Map map = new HashMap();
    list2.addAll((List)Arrays.asList(msg.split(" ")));
    String n[]=msg.split(" ");
    int f=0;
    for(int i=0;i<n.length;i++){
         f=Collections.frequency(list2, n[i]);
         map.put(n[i],f);
    }     
    System.out.println("values are "+map);
}

答案 2 :(得分:0)

因为Java 8的答案虽然很好,但没有向您展示如何在Java 7中并行它(除了默认实现与stream相同)之外,这里有一个例子:

  public static void main(final String[] args) throws InterruptedException {

    final ExecutorService service = Executors.newFixedThreadPool(10);

    final List<String> terms = Arrays.asList("Coding is great", "Search Engines are great",
        "Google is a nice search engine");

    final List<Callable<String[]>> callables = new ArrayList<>(terms.size());
    for (final String term : terms) {
      callables.add(new Callable<String[]>() {

        @Override
        public String[] call() throws Exception {
          System.out.println("splitting word: " + term);
          return term.split(" ");
        }
      });
    }

    final ConcurrentMap<String, AtomicInteger> counter = new ConcurrentHashMap<>();
    final List<Callable<Void>> callables2 = new ArrayList<>(terms.size());
    for (final Future<String[]> future : service.invokeAll(callables)) {
      callables2.add(new Callable<Void>() {

        @Override
        public Void call() throws Exception {
          System.out.println("counting word");
          // invokeAll implies that the future finished it work
          for (String word : future.get()) {
            String lc = word.toLowerCase();
            // here it get tricky. Two thread might add the same word.
            AtomicInteger actual = counter.get(lc);
            if (null == actual) {
              final AtomicInteger nv = new AtomicInteger();
              actual = counter.putIfAbsent(lc, nv);
              if (null == actual) {
                actual = nv; // nv got added.
              }
            }
            actual.incrementAndGet();
          }
          return null;
        }
      });
    }
    service.invokeAll(callables2);
    service.shutdown();

    System.out.println(counter);

  }

是的,Java 8简化了工作!

不,我测试了它,但不知道它是否比简单循环更好,也不知道它是否完全线程安全。

(看看如何定义列表,不是用Groovy编写的?Groovy中存在并行支持)。