计算URL中字母的出现次数

时间:2013-09-12 21:28:36

标签: java

我正在尝试计算网址中每个字母的出现次数。

我发现这个代码似乎可以解决这个问题,但我希望能解释一些事情。

1)我使用的是挪威字母,所以我需要再添加三个字母。我将数组更改为29,但它不起作用。

2)您能否向我解释%c%7d\n的含义是什么?

01  import java.io.FileReader;
02  import java.io.IOException;
03   
04   
05  public class FrequencyAnalysis {
06      public static void main(String[] args) throws IOException {
07      FileReader reader = new FileReader("PlainTextDocument.txt");
08   
09      System.out.println("Letter Frequency");
10   
11      int nextChar;
12      char ch;
13   
14      // Declare 26 char counting
15      int[] count = new int[26];
16   
17      //Loop through the file char
18      while ((nextChar = reader.read()) != -1) {
19          ch = Character.toLowerCase((char) nextChar);
20   
21          if (ch >= 'a' && ch <= 'z')
22          count[ch - 'a']++;
23      }
24   
25      // Print out
26      for (int i = 0; i < 26; i++) {
27          System.out.printf("%c%7d\n", i + 'A', count[i]);
28      }
29   
30      reader.close();
31      }
32  }

2 个答案:

答案 0 :(得分:2)

您还没有说过如何检查另外3封信件。仅增加count数组的大小是不够的。您需要在此处考虑新字符的unicode点值。有可能不再方便地按顺序排序这些值。在这种情况下,您可以使用Map<Integer, Integer>来存储频率。

%c是unicode字符的格式说明符。 %7d是具有最左边空格填充的整数的说明符。 \n是换行符

Formatter javadoc

中记录

答案 1 :(得分:1)

这里重要的是,当你想增加数组中出现的次数时,你隐式使用字符的ASCII码:

//Here, ch is a char.
ch = Character.toLowerCase((char) nextChar);

  //I hate *if statements* without curly brackets but this is off-topic :)
  if (ch >= 'a' && ch <= 'z')

    /*
     * but here, ch is implicitly cast to an integer.
     * The int value of a char is its ASCII code.
     * for example, the value of 'a' is 97.
     * So if ch is 'a', (ch - 'a') = (97 - 97) = 0.
     * That's why you are incrementing count[0] in this case.
     *
     * Now, what happens if ch ='ø'? What is the ASCII code of ø?
     * Probably something quite high so that ch-'a' is probably out of bounds
     * but the size of your array is 26+3 only.
     *
     * EDIT : after a quick test, 'ø' = 248.
     *
     * This would work if norvegian characters had ASCII code between 98 and 100.
     */
     count[ch - 'a']++;

您应该使用HashMap<Character, Integer>重写算法。

//HashMap<Character, nb occurences of this character>
HashMap<Character, Integer> map = new HashMap<Character, Integer>();

while ((nextChar = reader.read()) != -1) {
  if(!map.containsKey(nextChar)) {
    map.put(nextChar, 0);
  }
  map.put(nextChar, map.get(nextChar)+1);
}