Java 词频按长度排序,然后按字母顺序

时间:2021-01-11 20:22:42

标签: java

    print(frequency); // instead of print(list);
    public static List<String> sort(List<String> list){
        /** We use string compare to sort the list by length first then Collections.sort will sort it alphabetically  **/
        Collections.sort(list, new Comparator<String>() {
            public int compare(String o1, String o2) {
                if (o1.length() < o2.length()) { /**Second string length is greater than first string length**/
                    return -1;
                } else if (o1.length() > o2.length()) {/**Second string length is less than first string length**/
                    return 1;
                } else {        /** Equal **/
                    return o1.compareTo(o2);
        return list;
    /** Simply prints the length and sorted words. Sorted by their length first then alphapetically. **/
    public static void print(Map<String, Integer> frequency) {
        frequency.forEach((word, freq) -> System.out.printf("%d - %w: %d%n", word.length(), word, freq));
    /** Counts how many words **/
    public static void count(SimpleCharacterReader stream) {
        try {
            while (true) {
                a = getReader(stream);
                /** When a space, next line or tab is met, we assume a word was met. **/
                if ((a == ' ') || (a == '\n') || (a == '\t')) {
        } catch (EOFException eof) {
        arr = new String[count];
    /** Gets the characters **/
    public static char getReader(ICharacterReader reader) throws EOFException {
        return reader.GetNextChar();
    /** Store unsorted words in array **/
    public static void store(ICharacterReader info) throws EOFException {
        int i = 0;
        while (i < count) {
            s = info.GetNextChar();
            if (Character.isLetterOrDigit(s)) {
                word += Character.toString(s);
            } else if (s == ' ' || s == '\n' || s == '\t') {
                arr[i++] = word;
                word = "";

我的任务是创建一个字符阅读器,它从项目文件中读取并创建一个按长度排序的词频图表,如果它们的长度相同,则按字母顺序排列。这是我到目前为止的代码,但我的输出有重复的单词并且没有频率?怎么了?谢谢!我不确定我哪里出错了。 Isimplecharacter 和 Character 是我得到的文件。


import java.util.Random;

public class SimpleCharacterReader implements ICharacterReader {
    private int m_Pos = 0;

    public static final char lf = '\n';

    private String m_Content = "It was the best of times, it was the worst of times," + 
    lf +
    "it was the age of wisdom, it was the age of foolishness," + 
    lf +
    "it was the epoch of belief, it was the epoch of incredulity," + 
    lf +
    "it was the season of Light, it was the season of Darkness," + 
    lf +
    "it was the spring of hope, it was the winter of despair," + 
    lf +
    "we had everything before us, we had nothing before us," + 
    lf +
    "we were all going direct to Heaven, we were all going direct" + 
    lf +
    "the other way--in short, the period was so far like the present" + 
    lf +
    "period, that some of its noisiest authorities insisted on its" + 
    lf +
    "being received, for good or for evil, in the superlative degree" + 
    lf +
    "of comparison only." + 
    lf +
    "There were a king with a large jaw and a queen with a plain face," + 
    lf +
    "on the throne of England; there were a king with a large jaw and" + 
    lf +
    "a queen with a fair face, on the throne of France.  In both" + 
    lf +
    "countries it was clearer than crystal to the lords of the State" + 
    lf +
    "preserves of loaves and fishes, that things in general were" + 
    lf +
    "settled for ever";

    Random m_Rnd = new Random();

    public char GetNextChar() throws EOFException {

        if (m_Pos >= m_Content.length()) {
            throw new EOFException();

        return m_Content.charAt(m_Pos++);


    public void Dispose() {
        // Do nothing


public interface ICharacterReader {

    char GetNextChar() throws EOFException;

    void Dispose();

import java.lang.Character;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class Analyse {
    static String arr[];        /** Store unsorted words in array**/
    public static int count = 0;
    static char a, s;
    static String word = "";

    public static void main(String[] args) throws EOFException {

        SimpleCharacterReader stream = new SimpleCharacterReader();
        List<String> list = new ArrayList<>();
        String str = "";
        /** Need to know how may words we have**/
        SimpleCharacterReader st = new SimpleCharacterReader();
        /**Store valid words in list to sort later**/
        for (int i = 0; i < arr.length; i++) {
            if (arr[i] != " " && arr[i] != null) {
                str = arr[i];
        sort(list); /**Sort list**/
        print(list);/**Print list**/
    public static List<String> sort(List<String> list){
        /** We use string compare to sort the list by length first then Collections.sort will sort it alphabetically  **/
        Collections.sort(list, new Comparator<String>() {
            public int compare(String o1, String o2) {
                if (o1.length() < o2.length()) { /**Second string length is greater than first string length**/
                    return -1;
                } else if (o1.length() > o2.length()) {/**Second string length is less than first string length**/
                    return 1;
                } else {        /** Equal **/
                    return o1.compareTo(o2);
        return list;
    /** Simply prints the length and sorted words. Sorted by their length first then alphapetically. **/
    public static void print(List<String> list) {
        for (int i = 0; i < arr.length; i++) {
            System.out.println(list.get(i).length() + " - " + list.get(i));
    /** Counts how many words **/
    public static void count(SimpleCharacterReader stream) {
        try {
            while (true) {
                a = getReader(stream);
                /** When a space, next line or tab is met, we assume a word was met. **/
                if ((a == ' ') || (a == '\n') || (a == '\t')) {
        } catch (EOFException eof) {
        arr = new String[count];
    /** Gets the characters **/
    public static char getReader(ICharacterReader reader) throws EOFException {
        return reader.GetNextChar();
    /** Store unsorted words in array **/
    public static void store(ICharacterReader info) throws EOFException {
        int i = 0;
        while (i < count) {
            s = info.GetNextChar();
            if (Character.isLetterOrDigit(s)) {
                word += Character.toString(s);
            } else if (s == ' ' || s == '\n' || s == '\t') {
                arr[i++] = word;
                word = "";

1 个答案:

答案 0 :(得分:1)

您需要将单词收集到一个 TreeMap 中,该 arr 支持有序键并且可能会提供一个 custom comparator via constructor: public TreeMap(Comparator<? super K> comparator)

每个单词的频率将存储为一个值。可以使用 Map::merge 函数累积。

假设所有的词都被方法store读入数组// class Analyse, method main store(st); /**Store valid words in a sorted map and count word frequency **/ // create TreeMap with custom Comparator as a lambda Map<String, Integer> frequency = new TreeMap<>((s1, s2) ->, s2.length()) == 0 ? s1.compareTo(s2) :, s2.length())); for (String word : arr) { if (null != word && !" ".equals(word)) { frequency.merge(word, 1, (acc, one) -> acc + one); } } print(frequency); // instead of print(list); ,代码可以更新如下:


接下来,需要更新方法 public static void print(Map<String, Integer> frequency) { frequency.forEach((word, freq) -> System.out.printf("%d - %w: %d%n", word.length(), word, freq)); } 以处理地图而不是列表:


此外,可以使用 Stream API 使用构建为 Collectors.toMap 链的自定义比较器对单词进行排序,然后使用 LinkedHashMap 将频率收集到保持插入的 // class Analyse, method main store(st); Map<String, Integer> frequency = Arrays .stream(arr) // Stream<String> .filter(word -> null != word && !" ".equals(word)) .sorted( // sort Comparator.comparingInt(String::length) .thenComparing(String::compareTo) // or compareToIgnoreCase if needed ) .collect(Collectors.toMap( word -> word, // use word as a key word -> 1, // 1 as initial value Integer::sum, // merge function to count frequency LinkedHashMap::new // maintain insertion order (by sorted keys) )); print(frequency); // instead of print(list); 中顺序:
