Question

这是practice question：

You have classification data with classes Y ∈ {+1, −1} and features Fi ∈ {+1, −1} for
i ∈ {1, . . . , K}. In an attempt to turbocharge your classifier, you duplicate each feature, so now each example
has 2K features, with FK+i = Fi for i ∈ {1, . . . , K}. The following questions compare the original feature set
with the doubled one. You may assume that in the case of ties, class +1 is always chosen. Assume that there
are equal numbers of training examples in each class.

该解决方案表明，这会导致过度自信。但是如何？

在朴素贝叶斯中，我们假定给定类标签，每个功能都独立于其他功能。

假设其中一个示例具有功能{1，-1}。

P(y = -1 | x_1 = 1, x_2 = -1) = P(y=-1)P(x_1 = 1 | y= -1) 
X P(x_2 = -1 | y=-1)

如果将功能加倍，我们将改写为：

P(y = 1 | x_1 = 1, x_2 = 1, x_3, = -1, x_4 = -1) = 
P(y=-1) x P(x_1 = 1 | y=-) * P(x_2 = 1 | y=-) X P(x_3 = -1 | y=-) 
X P(x_3 = -1 | y=-)

每个概率都小于1-那么在双重特征示例中，乘以更多的分数会不会导致较小的概率（从而降低可信度分类）？

天真的贝叶斯分类器中的冗余功能导致过度自信

0 个答案: