'列车'和'班级'有不同的长度

时间:2017-12-10 07:54:44

标签: r knn predict

我是R的新手并拥有一个非常简单的数据集,但我似乎无法弄清楚为什么我会去训练'和'班级'有不同的长度。请建议

enter image description here

library(class)
file_4 <- file_4[,-1]
data_norm <- function(x) { ((x - min(x))/ (max(x)- min(x)))}
file_4_norm <- as.data.frame(lapply(file_4[,-4], data_norm))
summary(file_4[,1:3])
summary(file_4_norm[,1:3])

data_tr <- file_4_norm[1:4,]
data_ts <- file_4_norm[5:6,]

dim(data_tr)
dim(data_ts)
dim( file_4[1:4,4])

data_pred <- knn(data_tr, data_ts,  file_4[1:4,4], k=1)

1 个答案:

答案 0 :(得分:0)

我只是构建了你的代码。希望这有帮助!

#sample data
df <- data.frame(X1=c(0,0,0,0,-1,1),
                 X2=c(3,0,1,1,0,1),
                 X3=c(0,0,3,2,1,1),
                 Y=c('Red','Red','Red','Green','Green','Red'))

#standardize attributes
color <- df[,ncol(df)]    # save 'Y' a in separate variable as we don't want to standardize it
df_minus_Y <- df[,-ncol(df)] 
maxs <- apply(df_minus_Y, 2, max)   #maximum of each column 
mins <- apply(df_minus_Y, 2, min)   #minimum of each column
standardized.df_minus_Y <- as.data.frame(scale(df_minus_Y, scale = maxs - mins, center = mins))

#split data in train/ test (in reality it should be done at random but here I just tried to imitate your example)
train_idx = 1:4
#train dataset
train_data <- standardized.df_minus_Y[train_idx,]
train_color <- color[train_idx]
#test dataset
test_data <- standardized.df_minus_Y[-train_idx,]
test_color <- color[-train_idx]

#knn model
library(class)
set.seed(123)
predicted_color <- knn(train_data,test_data,train_color,k=1)

#accuracy
mean(test_color == predicted_color)