Java中的朴素贝叶斯分类器

时间:2017-02-26 19:25:08

标签: java naivebayes

以下是一种将幸运cookie消息分类为预测消息或明智消息的方法。

我的代码构建的数据库似乎是正确的,所以我并不担心。我在一组322幸运cookie培训消息上构建了这个数据库。

我尝试在下面实现的计算是:

  

[(消息明智的概率)*(您找到的概率)   "字"知道明智地找到它的可能性   引用)/(发现该词的概率)

     

在日志空间中完成。

我发现由于某些原因,某些概率是相等的(当他们肯定不应该这样做时,我已做过一些手工计算)。 TestLabels是一个由正确答案组成的文件:0和0表示明智的引号,1表示预测答案,所有这些都在不同的行上。

 public void classifyOne (Words[] db) throws IOException {

    int right = 0;
    int wrong = 0;
    int equal = 0;
    Scanner inFile1 = new Scanner(new File("testdata.txt"));
    Scanner inFile2 = new Scanner(new File("testlabels.txt"));
    int[] label = new int[101];
    String[] allmessages = new String[101];

    for (int k = 0; k < allmessages.length; k++) {

        String onemessage = inFile1.nextLine();
        allmessages[k] = onemessage;
        String[] wordsfrommsg = onemessage.split("\\s");

        //read in the class of fortune cookie message, either 1 or 0.
        label[k] = Integer.parseInt(inFile2.nextLine());

        //probability based on training data that a quote is either wise or predictive
        double wiseprob = Math.log(170.0)-Math.log(322.0);
        double predictprob = Math.log(152.0)-Math.log(322.0);

        for (int j = 0; j < db.length; j++) {

            for (int i = 0; i < wordsfrommsg.length; i++) {

                //if you find a word in the onemessage that matches a word from the database...
                if (wordsfrommsg[i].equals(db[j].word)) {

                    //multiply by the probability of finding that word, given that it's a prediction quote
                    predictprob += Math.log ((db[j].predcount) / 152.0);
                    //and divide by the probability that I find that word at all.
                    predictprob -= Math.log(((db[j].wisecount + db[j].predcount) / 704.0));
                    wiseprob += Math.log(((db[j].wisecount) / 170.0));
                    wiseprob -= Math.log(((db[j].wisecount + db[j].predcount) / 704.0));
                }
            }
        }

        //there are messages for which the probabilities are equal.
        if(Math.pow(predictprob, 10)==Math.pow(wiseprob, 10)){equal+=1; System.out.println(allmessages[k]);}

        if (Math.pow(predictprob, 10)>Math.pow(wiseprob, 10)) {
            if (label[k] == 1) {
                right++;
            } else {
                wrong++;
            }
        }

        else if (Math.pow(predictprob, 10)<Math.pow(wiseprob, 10)){
            if (label[k] == 1) {
                wrong++;
            } else {
                right++;
            }
        }
    }

    System.out.println(right);
    System.out.println(wrong);
    System.out.println(equal);
}

0 个答案:

没有答案