Question

所以我试图编码k最近邻算法。我的函数的输入是一组数据和一个要分类的样本。我只是想了解算法的工作原理。你能告诉我这个＆＃34;伪代码＆＃34;我想做的是正确的吗？

kNN (dataset, sample){

   1. Go through each item in my dataset, and calculate the "distance" from that data item to my specific sample.
   2. Out of those samples I pick the "k" ones that are most close to my sample, maybe in a premade array of "k" items?

}

我感到困惑的部分是当我说＆＃34;浏览我的数据集中的每个项目＆＃34;。我是否应该浏览数据集中的每个CLASS并找到k-最近邻居？然后从那里找到哪一个最接近我的样本，然后告诉我班级？

第2部分问题（ish），正在使用此算法但没有样本。我如何计算＆＃34;准确度＆＃34;数据集？

我真的在寻找广泛的单词答案，而不是具体细节，但任何有助于我理解的内容都表示赞赏。我在R。

中实现这一点

由于

Answer 1

你的伪代码应该改变这种方式：

kNN (dataset, sample){
   1. Go through each item in my dataset, and calculate the "distance" 
   from that data item to my specific sample.
   2. Classify the sample as the majority class between K samples in 
   the dataset having minimum distance to the sample.
}

此pseduocode已在下图中说明。

enter image description here

假设数据集由两个A和B类组成，分别显示为红色和蓝色，我们希望将K = 5的KNN应用于样本，用绿色和紫色星表示。
KNN计算每个测试样本与所有样本的距离，找到五个邻居，与测试样本的距离最小，并将多数类分配给测试样本。

准确度：1 - （错误分类的测试样本数/测试样本数）

在＆＃34; R＆＃34;您可能会看到this或this。

K最近邻伪码？

1 个答案: