Question

我需要根据频率对单词列表进行排序。

我的意见：

Haha, hehe, haha, haha, hehe, hehe.... , Test

例如，在我的数据结构中，我会有

Haha:3
Hehe:5
Test:10

我需要以这种方式在输出中对数据结构进行排序：

Test:10
Hehe:5
Haha:3

这样，如果我弹出数据结构的顶部，我将能够获得该元素及其相应的频率。

元素的数量最初是未知的，因此，数组是不可行的。如果我想获得前几个元素，我只需要按顺序访问它。这在Java中是否可行？

Answer 1

首先，要确认：在排序之前你能得到所有的全部单词吗？或者这些话是不断涌现的？

（1）对于前一种情况，您可以使用Set来存储单词，然后将它们放入PriorityQueue。如果实现比较器功能，队列将自动对单词进行排序。我创建了一个新类Pair来存储文本和频率，请参阅代码：

import java.util.Queue;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.HashSet;
import java.util.Comparator;

public class PriorityQueueTest {

    public static class Pair {
        private String text;
        private int frequency;

        @Override
        public int hashCode() {
            return text.hashCode();
        }

        @Override
        public String toString() {
            return text + ":" + frequency;
        }

        public Pair(String text, int frequency) {
            super();
            this.text = text;
            this.frequency = frequency;
        }

        public String getText() {
            return text;
        }
        public void setText(String text) {
            this.text = text;
        }
        public int getFrequency() {
            return frequency;
        }
        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }
    }

    public static Comparator<Pair> idComparator = new Comparator<Pair>(){
        @Override
        public int compare(Pair o1, Pair o2) {
            if(o1.getFrequency() > o2.getFrequency()) {
                return -1;
            }
            else if(o1.getFrequency() < o2.getFrequency()){
                return 1;
            }
            else {
                return 0;
            }
        }
    };

    public static void main(String[] args) {
        Set<Pair> data = new HashSet<Pair>();
        data.add(new Pair("haha", 3));
        data.add(new Pair("Hehe", 5));
        data.add(new Pair("Test", 10));

        Queue<Pair> queue = new PriorityQueue(16, idComparator);

        for(Pair pair : data) {
            queue.add(pair);
        }

        // Test the order
        Pair temp = null;
        while((temp = queue.poll()) != null) {
            System.out.println(temp);
        }

    }

}

（2）对于另一种情况（单词连续出现），您可以使用TreeMap来保留订单。请参见参考：http://www.java-samples.com/showtutorial.php?tutorialid=370

Answer 2

要保留所需的信息，可以创建一个包含字符串和计数的类（例如Pair），并将此类的实例保存在{{1}中}。这种方法会使给定字符串的计数增量效率低下，因为您必须查找以线性时间（ O（N））保存字符串的元素，然后递增它。

更好的方法是使用List<Pair>，这样就可以在恒定时间内完成搜索（ O（1）），然后您可以对{{1}中的元素进行排序由Map.entrySet()返回。

Answer 3

列表项

我从下面的URL开始作为参考，我将基于该参考：

How can I count the occurrences of a list item in Python?

现在，大楼开始了：

>>> from collections import Counter
>>> word_list = ['blue', 'red', 'blue', 'yellow', 'blue', 'red','white','white']
>>> Counter(word_list)
Counter({'blue': 3, 'red': 2, 'white': 2, 'yellow': 1})

注意Counter（word_list）如何显示元素列表，即按频率递减顺序排序的字/频率对。不幸的是，提取单词并在按相同顺序排序的列表中编译它们需要更多的工作：

（1）获得＆＃34;尺寸＆＃34;作为JSON对象中的元素数。

（2）应用＆＃34; most_common＆＃34; JSON对象上的方法，用于按频率获取元素的排序数组。

（3）应用列表推导来生成从排序数组中提取的单词列表。

>>> size = len(Counter(word_list))
4
>>> word_frequency_pairs = Counter(word_list).most_common(size)
>>> word_frequency_pairs
[('blue', 3), ('white', 2), ('red', 2), ('yellow', 1)]
>>> [i[0] for i in word_frequency_pairs]
['blue', 'white', 'red', 'yellow']

我喜欢Python的原因是：）

根据单词的频率对列表进行排序

3 个答案: