R:使用knncat对分类变量

时间:2016-02-10 15:51:14

标签: r classification

我有一个包含4列的数据集,其中2列是数字,1是分类,1是标签。标签有13个级别(A到M)。我尝试在R中使用knncat包进行分类,但每次运行代码时,都会收到以下错误消息:

Error in `[<-.data.frame`(`*tmp*`, factor.vars, value = c("M", "J", "K",  : 
 replacement has 45500 rows, data has 1

以下是我使用的代码:

data <- read.csv('mosaic_data2.csv', header = T)
num <- dim(data)[1]

library(sampling)
set.seed(1234)
train_index <- sample(seq(1,num,1), floor(num * 0.7), replace = F)
test_index <- setdiff(seq(1,num,1), train_index)

train_data <- data[train_index,]
test_data <- data[test_index,]

library(knncat)
model <- knncat(train_data, classcol = 2)

任何人都可以看看代码并建议我如何消除这个错误?非常感谢你!

dput(head(data,100))的输出如下:

structure(list(latitude = c(52.7326028, 52.74287543, 52.82107841, 
52.82025363, 52.81980596, 52.81721897, 52.81274172, 52.81274172, 
52.8089586, 52.81424219, 52.8089586, 52.74007929, 52.77394023, 
52.73659034, 52.73672518, 52.73764626, 52.73753744, 52.73659034, 
52.73815233, 52.73679388, 52.73890319, 52.71697237, 52.63730282, 
52.62720385, 52.63730282, 52.63543017, 52.63768035, 52.63510366, 
52.6346578, 52.6346578, 52.6346578, 52.63447454, 52.63576418, 
52.63447454, 52.6346578, 52.63447454, 52.69820719, 52.69603926, 
52.68246919, 52.54600173, 52.54210198, 52.60628983, 52.61003275, 
52.60278236, 52.60239604, 52.60348688, 52.60239604, 52.60382146, 
52.60315644, 52.86047938, 52.86576353, 52.86954228, 52.81039471, 
52.82094872, 52.82395073, 52.82444705, 52.88098384, 52.88469208, 
52.88469208, 52.84979201, 52.84720159, 52.84831759, 52.82435938, 
52.82319493, 52.82168337, 52.8230402, 52.8230402, 52.82513486, 
52.82472379, 52.82756385, 52.82475438, 52.82434902, 52.82166611, 
52.823712, 52.82401481, 52.82483489, 52.82103704, 52.82060763, 
52.8208682, 52.82211317, 52.81868547, 52.8198332, 52.82023595, 
52.81989134, 52.8196971, 52.82051066, 52.82463338, 52.82539131, 
52.82580625, 52.82509199, 52.83759415, 52.83946254, 52.83946254, 
52.83891871, 52.83821538, 52.84757879, 52.84663773, 52.8449371, 
52.84592185, 52.84331619), longitude = c(-6.892397941, -6.915346343, 
-6.922554014, -6.924997835, -6.926099967, -6.883340697, -6.897757597, 
-6.897757597, -6.895500952, -6.883129556, -6.895500952, -6.703781864, 
-6.680851783, -6.771845364, -6.773301282, -6.772958488, -6.77484647, 
-6.771845364, -6.773422218, -6.772164896, -6.770622695, -6.784187251, 
-6.901922588, -6.905109015, -6.901922588, -6.976679508, -6.973114498, 
-6.974753462, -6.947990431, -6.947990431, -6.947990431, -6.976921427, 
-6.958295227, -6.976921427, -6.947990431, -6.976921427, -6.902010609, 
-6.915233457, -6.871160885, -6.832461149, -6.862126342, -6.943925285, 
-6.93813643, -6.925128034, -6.932247524, -6.93461305, -6.932247524, 
-6.934657053, -6.929283954, -6.845259603, -6.861188287, -6.866476268, 
-6.940851164, -6.939203401, -6.930506188, -6.933317462, -6.929441954, 
-6.922589037, -6.922589037, -6.926037258, -6.929423169, -6.917829279, 
-6.938211918, -6.940658091, -6.940651748, -6.940107883, -6.940107883, 
-6.938704642, -6.939084526, -6.933331264, -6.937496468, -6.937678962, 
-6.940276221, -6.94018054, -6.939876475, -6.938983181, -6.934235666, 
-6.93387209, -6.933134226, -6.934193569, -6.934383596, -6.933832641, 
-6.937454656, -6.933818238, -6.93443811, -6.936913947, -6.920030341, 
-6.920400963, -6.92215006, -6.910771124, -6.901500591, -6.899018998, 
-6.899018998, -6.903007684, -6.90119821, -6.91063672, -6.909935672, 
-6.90240965, -6.900066763, -6.901411136), mosaic_group = structure(c(10L, 
10L, 8L, 8L, 8L, 7L, 7L, 7L, 7L, 7L, 7L, 10L, 10L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 10L, 10L, 10L, 13L, 13L, 13L, 13L, 
9L, 6L, 6L, 6L, 6L, 6L, 10L, 8L, 8L, 9L, 9L, 9L, 9L, 7L, 7L, 
7L, 9L, 9L, 9L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 8L, 8L, 8L, 8L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 8L, 
6L, 6L, 6L, 6L, 6L, 8L, 8L, 10L, 10L, 10L), .Label = c("A", "B", 
"C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M"), class = "factor"), 
small_code = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
4L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L, 14L, 14L, 14L, 15L, 16L, 16L, 
17L, 17L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 
21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 
22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 
24L, 25L, 26L, 26L, 26L, 26L, 26L, 27L, 27L, 28L, 28L, 28L
)), .Names = c("latitude", "longitude", "mosaic_group", "small_code"
), row.names = c(NA, 100L), class = "data.frame")    

1 个答案:

答案 0 :(得分:1)

函数knncat::knncat接受参数classcol,该参数定义为:

  

包含分类的列。默认值:1。

你有一个结构数据集:

  latitude longitude mosaic_group small_code
1 52.73260 -6.892398            J          1
2 52.74288 -6.915346            J          1
3 52.82108 -6.922554            H          2
4 52.82025 -6.924998            H          2
5 52.81981 -6.926100            H          2
6 52.81722 -6.883341            G          3

因此,你的论点应该是classcol = 3(或4)我假设,但我们可以看到它肯定不应该是classcol = 2

相关问题