只能处理第一个字符串

时间:2011-03-12 10:50:06

标签: java

// Calculating term frequency
int filename = 11;
String[] fileName = new String[filename];
int a = 0;
int totalCount = 0;
int wordCount = 0;


// Count inverse document frequency

System.out.println("Please enter the required word  :");
Scanner scan2 = new Scanner(System.in);
String word2 = scan2.nextLine();
String[] array2 = word2.split(" ");
int numofDoc;

for (int b = 0; b < array2.length; b++) {

    numofDoc = 0;

    for (int i = 0; i < filename; i++) {

        try {

            BufferedReader in = new BufferedReader(new FileReader(
                           "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
                           + i + ".txt"));

            int matchedWord = 0;

            Scanner s2 = new Scanner(in);

            {

                while (s2.hasNext()) {
                    if (s2.next().equals(array2[b]))
                        matchedWord++;
                }

            }
            if (matchedWord > 0)
                numofDoc++;

        } catch (IOException e) {
            System.out.println("File not found.");
        }

    }
    System.out.println(array2[b]
                       + " --> This number of files that contain the term  "
                       + numofDoc);


    //calculate TF-IDF
    for (a = 0; a < filename; a++) {

        try {
            System.out.println("The word inputted : " + word2);
            File file =
                new File("C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
                         + a + ".txt");
            System.out.println(" _________________");

            System.out.print("| File = abc" + a + ".txt | \t\t \n");

            for (int i = 0; i < array2.length; i++) {

                totalCount = 0;
                wordCount = 0;

                Scanner s = new Scanner(file);
                {
                    while (s.hasNext()) {
                        totalCount++;
                        if (s.next().equals(array2[i]))
                            wordCount++;

                    }

                    System.out.print(array2[i] + " --> Word count =  "
                                     + "\t\t " + "|" + wordCount + "|");
                    System.out.print("  Total count = " + "\t\t " + "|"
                                     + totalCount + "|");
                    System.out.printf("  Term Frequency =  | %8.4f |",
                                      (double) wordCount / totalCount);

                    System.out.println("\t ");

                    double inverseTF = Math.log10((float) numDoc / numofDoc);
                    System.out.println("    --> IDF " +  inverseTF );

                    double TFIDF = (((double) wordCount / totalCount) * inverseTF );
                    System.out.println("    --> TF/IDF " + TFIDF);
                }
            }
        } catch (FileNotFoundException e) { 
            System.out.println("File is not found");
        }
    }
}

当我输入一个字符串时,让我们说'how',代码将搜索包含字符串'how'的文件数。

例如输出:

The number of files containing 'how' is 5.

然后代码将继续计算频率 - 逆文档频率这一术语。

当我输入3个字符串时,例如“你好吗”。

输出仅显示字符串'how'。

示例输出:

Please enter the required word  :
you

you --> This number of files that contain the term  6

The word inputted : you

 _________________
| File = abc0.txt |          
you --> Word count =         |3|  Total count =          |150|  Term Frequency =  |   0.0200 |   
    --> IDF 0.2632414441876607
    --> TF/IDF 0.005264828883753215

The word inputted : you

如果我输入3个字符串:'你好吗'

Please enter the required word  :
how are you
how --> This number of files that contain the term  6

&lt; ---它只处理第一个字符串'how'

The word inputted : how are you
 _________________
| File = abc0.txt |          
how --> Word count =         |0|  Total count =          |150|  Term Frequency =  |   0.0000 |   
    --> IDF Infinity
    --> TF/IDF NaN

are --> Word count =         |0|  Total count =          |150|  Term Frequency =  |   0.0000 |   
    --> IDF Infinity
    --> TF/IDF NaN

you --> Word count =         |3|  Total count =          |150|  Term Frequency =  |   0.0200 |   
    --> IDF Infinity
    --> TF/IDF Infinity

然后字符串的其余部分将只使用一个数量为0的文件。每个字符串都假设有各自的文件数。

如何让代码接收3个不同的文件数?

1 个答案:

答案 0 :(得分:3)

为了计算每个searchterm的文档数量,可以使用int数组来保持计数:

String[] array2 = word2.split(" ");
int[] numofDoc = new int[array2.length];

for (int b = 0; b < array2.length; b++) {

    numofDoc[b] = 0;

在计算时使用数组元素:

            if (matchedWord > 0) {
                numofDoc[b]++;
            }

以后使用数组元素来计算:

            double inverseTF = Math.log10((float) numDoc / numofDoc[i]);