获得单词频率的更有效方法

时间:2013-03-30 18:22:19

标签: java optimization

我想通过单词的开头计算ArrayList中每个单词的频率。例如[cat,cog,mouse]意味着有两个单词以 c 开头,一个单词以 m 开头。我的代码工作正常,但字母表中有26个字母,如果,则需要更多。还有其他方法吗?

public static void  countAlphabeticalWords(ArrayList<String> arrayList) throws IOException
{
    int counta =0, countb=0, countc=0, countd=0,counte=0;
    String word = "";
    for(int i = 0; i<arrayList.size();i++)
    {

        word = arrayList.get(i);

          if (word.charAt(0) == 'a' || word.charAt(0) == 'A'){ counta++;}
          if (word.charAt(0) == 'b' || word.charAt(0) == 'B'){ countb++;}    

    }
    System.out.println("The number of words begining with A are: " + counta);
    System.out.println("The number of words begining with B are: " + countb);

}

3 个答案:

答案 0 :(得分:7)

使用地图

public static void  countAlphabeticalWords(List<String> arrayList) throws IOException {
  Map<Character,Integer> counts = new HashMap<Character,Integer>();
  String word = "";

  for(String word : list) {
    Character c = Character.toUpperCase(word.charAt(0));
    if (counts.containsKey(c)) {
      counts.put(c, counts.get(c) + 1);
    }
    else {
      counts.put(c, 1);
    }
  }

  for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
    System.out.println("The number of words begining with " + entry.getKey() + " are: " + entry.getValue());
  }

或使用Map和AtomicInteger(根据Jarrod Roberson)

public static void  countAlphabeticalWords(List<String> arrayList) throws IOException {
  Map<Character,AtomicInteger> counts = new HashMap<Character,AtomicInteger>();
  String word = "";

  for(String word : list) {
    Character c = Character.toUpperCase(word.charAt(0));
    if (counts.containsKey(c)) {
      counts.get(c).incrementAndGet();
    }
    else {
      counts.put(c, new AtomicInteger(1));
    }
  }

  for (Map.Entry<Character, AtomicInteger> entry : counts.entrySet()) {
    System.out.println("The number of words begining with " + entry.getKey() + " are: " + entry.getValue());
  }

最佳实践

永远不要list.get(i),而是使用for(element : list)。并且永远不要在签名中使用ArrayList而是使用接口List,以便您可以更改实现。

答案 1 :(得分:3)

这个怎么样?考虑到单词仅以[a-zA-Z]开头:

public static int[] getCount(List<String> arrayList) {
    int[] data = new int[26];
    final int a = (int) 'a';

    for(String s : arrayList) {
        data[((int) Character.toLowerCase(s.charAt(0))) - a]++;
    }

    return data;
}

修改

出于好奇,我做了一个非常简单的测试,将我的方法和Steph的方法与地图进行比较。 列出236项,10000000次迭代(不打印结果):我的代码耗时~10000ms,Steph耗时~65000ms。

测试:http://pastebin.com/HNBgKFRk

数据:http://pastebin.com/UhCtapZZ

答案 2 :(得分:0)

现在,每个字符都可以转换为整数,表示ASCII十进制。例如,(int)'a'为97. 'z'的ASCII十进制数为122. http://www.asciitable.com/

您可以为字符创建查找表:

int characters = new int[128]

然后在算法的循环中使用ASCII十进制作为索引并递增值:

word = arrayList.get(i);
characters[word.charAt(0)]++;

最后,您可以打印字符的出现位置:

for (int i = 97; i<=122; i++){
  System.out.println(String.format("The number of words beginning with %s are: %d", (char)i, characters[i]));
}