天真的贝叶斯分类器中的冗余功能导致过度自信

时间:2018-12-17 07:16:47

标签: artificial-intelligence classification naivebayes

这是practice question

You have classification data with classes Y ∈ {+1, −1} and features Fi ∈ {+1, −1} for
i ∈ {1, . . . , K}. In an attempt to turbocharge your classifier, you duplicate each feature, so now each example
has 2K features, with FK+i = Fi for i ∈ {1, . . . , K}. The following questions compare the original feature set
with the doubled one. You may assume that in the case of ties, class +1 is always chosen. Assume that there
are equal numbers of training examples in each class.

该解决方案表明,这会导致过度自信。但是如何?

在朴素贝叶斯中,我们假定给定类标签,每个功能都独立于其他功能。

假设其中一个示例具有功能{1,-1}。

P(y = -1 | x_1 = 1, x_2 = -1) = P(y=-1)P(x_1 = 1 | y= -1) 
X P(x_2 = -1 | y=-1)

如果将功能加倍,我们将改写为:

P(y = 1 | x_1 = 1, x_2 = 1, x_3, = -1, x_4 = -1) = 
P(y=-1) x P(x_1 = 1 | y=-) * P(x_2 = 1 | y=-) X P(x_3 = -1 | y=-) 
X P(x_3 = -1 | y=-)

每个概率都小于1-那么在双重特征示例中,乘以更多的分数会不会导致较小的概率(从而降低可信度分类)?

0 个答案:

没有答案
相关问题