朴素贝叶斯分类得分百分比

时间:2014-12-25 03:57:55

标签: c# classification bayesian

我需要一个解决方案,将文本分类分为多个类别。这种方法似乎运作良好:http://www.codeproject.com/Articles/14270/A-Naive-Bayesian-Classifier-in-C

我只有一个问题与返回的分数有关。目前,最高分意味着最适合该类别。

但我想获得每个类别的百分比值。

这是分数计算的一部分:

/// <summary>
/// Classifies a text<\summary>
/// <returns>
/// returns classification values for the text, the higher, the better is the match.</returns>
public Dictionary<string, double> Classify(System.IO.StreamReader tr)
{
    Dictionary<string, double> score = new Dictionary<string, double>();
    foreach (KeyValuePair<string, ICategory> cat in m_Categories)
    {
        score.Add(cat.Value.Name, 0.0);
    }

    EnumerableCategory words_in_file = new EnumerableCategory("", m_ExcludedWords);
    words_in_file.TeachCategory(tr);

    foreach (KeyValuePair<string, PhraseCount> kvp1 in words_in_file)
    {
        PhraseCount pc_in_file = kvp1.Value;
        foreach (KeyValuePair<string, ICategory> kvp in m_Categories)
        {
            ICategory cat = kvp.Value;
            int count = cat.GetPhraseCount(pc_in_file.RawPhrase);
            if (0 < count)
            {
                score[cat.Name] += System.Math.Log((double)count / (double)cat.TotalWords);
            }
            else
            {
                score[cat.Name] += System.Math.Log(0.01 / (double)cat.TotalWords);
            }
            System.Diagnostics.Trace.WriteLine(pc_in_file.RawPhrase.ToString() + "(" +
                cat.Name + ")" + score[cat.Name]);
        }


    }
    foreach (KeyValuePair<string, ICategory> kvp in m_Categories)
    {
        ICategory cat = kvp.Value;
        score[cat.Name] += System.Math.Log((double)cat.TotalWords / (double)this.CountTotalWordsInCategories());
    }
    return score;
}

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

如果我理解正确,您需要对Values中的所有Dictionary求和,它会给您100%。然后将每个Value除以收到的总和。 在return score;之前插入此代码:

double sum = score.Values.Sum();
foreach (var name in score.Keys)
{
    score[name] /= sum;
}