Question

我试图创建一个程序来计算给定文本文件中1个字母，2个字母等单词的频率。但是，它似乎只适用于小文件。

我查找了一些涉及数组的解决方案（我不完全理解）并将它们合并到代码中。当我用几个单词测试一个文件时，它有效，但是当给出一个大文件时，就像整个罗密欧与朱丽叶一样，它会给出错误的结果。

（另外，＆＃34; for（String str：strings）＆＃34;做什么？）

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;

class Authorship
{
    public static void main(String[] args)
    {
            try
            {
                    System.out.print("Name of input file: ");
                    Scanner in = new Scanner(System.in);
                    String name = in.nextLine();
                    File text = new File(name);
                    Scanner in2 = new Scanner(text);
                    String line = in2.nextLine();
                    String[] strings = line.split(" ");
                    int[] counts = new int[14];
                    for(String str : strings)
                    {
                            if (str.length() < counts.length)
                                    counts [str.length()] += 1;
                    }
                    for (int i = 1; i <= 13; i++)
                    {
                            System.out.print("Proportion of " + i + "-letter words: ");
                            System.out.println("( " + counts[i] + " words )");
                    }
            }
            catch (Exception FileNotFoundException)
            {
                    System.out.println("File not found");
            }
    }
}

提前致谢

Answer 1

增强For循环包含在第14章JLS的块和语句（jls-14.14.2）中，它说（部分） -

增强的for语句的含义是通过翻译成一个基本的for语句给出的，如下所示：   ...   Expression必须有一个数组类型T []。   设L1 ... Lm是紧接在增强的for语句之前的（可能是空的）标签序列。

增强的for语句相当于表单的基本for语句：
T[] #a = Expression;
L1: L2: ... Lm:
for (int #i = 0; #i < #a.length; #i++) {
    {VariableModifier} TargetType Identifier = #a[#i];
    Statement
}

另外，这个

 catch (Exception e) // FileNotFoundException)
 {
   System.out.println("Exception: " + e.getMessage());
   e.printStackTrace();
 }

最后，您的程序一次只能在一行上运行。如果你想在所有行上操作，你需要在catch之后移动输出循环，并在try之前放置int[] counts = new int[14];。

int[] counts = new int[14];
try {
  // ...
} catch (Exception e) {
   System.out.println("Exception: " + e.getMessage());
   e.printStackTrace();
}
for (int i = 0; i < counts.length; i++) { // <-- and arrays start at 0.
  System.out.print("Proportion of " + (i+1) + "-letter words: ");
  System.out.println("( " + counts[i] + " words )");
}

修改

Scanner in2 = new Scanner(text); String line; while ((line = in2.nextLine()) != null) { // <-- read all the lines String[] strings = line.split(" "); for (String str : strings) { if (str.length() < counts.length) { counts[str.length()]++; } } }

Answer 2

您可以使用Apache Commons countMatches方法 -

 StringUtils.countMatches(String string, String subStringToCount).

前 -

System.out.println(StringUtils.countMatches("String string".toUpperCase(), "S"));

给出输出= 2.

文字中X字母单词的频率

2 个答案: