Question

阅读此格式的文件：

japan
usa
japan
russia
usa
japan
japan
australia

按以下格式打印输出：

<country> : <count>

因此对于上面的文件输出将是：

japan : 4
usa : 2
australia : 1
russia : 1

请注意，由于澳大利亚和俄罗斯都计为1，因此名称在'r'之前排序，'a'。以最有效的方式做到这一点。

以下是我的尝试：

Read the entire file and insert into a HashMap.
We will have pairs like <japan, 4> in there.
Now read the HashMap and insert in another TreeMap<Integer, List<String>>
Iterate over TreeiMap using a Comparator, which will iterate in reverse-sorted order.
Sort value (which will be a List<String>) and print the result.

Answer 1

这可以在 O（n * S）中完成（n是输入字符串的数量，S是最大的字符串大小）我将给你一个通用算法，伪代码， Java会有点乱......

arr <- HashSet<String>[NumberOfElements]
map <- HashMap<String,int>
for each country:
   if country in map.keySet():
        count <- map.get(country)
        arr[count].del(country)
        map.delete(country)
        count <- count + 1
   else:
        count <- 1
   arr[count].add(country)  
   map.put(country,count)
for i=arr.length-1;i>=0;i--:
   sorted <- radixSort(arr[i])
   for each country in sorted:
      print country, i

这里是一个“直方图”，因为对于每次迭代，'size'最多增加1，我们用它来存储数据。

复杂性解释： 该算法使用radix sort，其中'数字'实际上是一个字符，并且是O（n），使用它将阻止O（nlogn）进行其他排序算法或使用TreeSet 我们迭代最多大小为n的数组（如果每个国家只出现一次）。

一个技巧点是循环内部的排序：它仍然是O（n），因为总体上你最多排序n个元素（而不是每次迭代n个元素！）所以它是O（2n）= O（n）。我们可以通过一次迭代预先找到NumberOfElements。

总的来说：它是 O（n * S），其中n是输入的数量（填充arr的地方），S是最大的字符串大小（因为我们需要阅读字符串...）

Answer 2

java.util.Map应该让你走上正轨。

Answer 3

在编码时间方面最有效的方法是忘记Java并使用sort | uniq -c | sort -n（顺便提一下，这是我最喜欢的shell片段之一）。如果您确实需要如图所示的格式，请使用awk。对于大型输入，运行时甚至都不会那么糟糕（因为那些是相当高效的程序），但启动时间将在您的示例列表中占主导地位。当然，在启动Eclipse之前，你可以在10,000次的某个地方运行它。

如何以最有效的方式获得以下输出？

3 个答案: