将多个级别编码为2个因子标签

时间:2014-02-13 16:58:51

标签: r

我有一个包含一些列的数据框:

  • 我想转变为一个因素,
  • 其中不同级别编码为-2, -1, 0, 1, 2, 3, 4
  • 我希望根据此约定将关卡标记为01

    -2 = 1
    -1 = 1
     0 = 0
     1 = 1
     2 = 1
     3 = 1
     4 = 0
    

我有以下代码:

#Convert to factor
dat[idx] <- lapply(dat[idx], factor, levels = -2:4, labels = c(1, 1, 0, 1, 1, 1, 0))

#Drop unused factor levels
dat <- droplevels(dat)

这有效,但它给了我以下警告:

In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
duplicated levels in factors are deprecated

我尝试了以下代码(根据Ananda Mahto的建议),但没有运气:

levels(dat[idx]) <- list(`0` = c(0, 4), `1` = c(-2, -1, 1, 2, 3))

我认为必须有更好的方法来做这个,有什么建议吗?

我的数据如下:

structure(list(Timestamp = structure(c(1380945601, 1380945603, 
1380945605, 1380945607, 1380945609, 1380945611, 1380945613, 1380945615, 
1380945617, 1380945619), class = c("POSIXct", "POSIXt"), tzone = ""), 
FCB2C01 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), RCB2C01 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C02 = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1), RCB2C02 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C03 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), RCB2C03 = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0), FCB2C04 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), RCB2C04 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C05 = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1), RCB2C05 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C06 = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1), RCB2C06 = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0), FCB2C07 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), RCB2C07 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C08 = c(1, 1, 1, 1, 1, 1, 
1, 1, 1, 1), RCB2C08 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), FCB2C09 = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1), RCB2C09 = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 0), FCB2C10 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), RCB2C10 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Timestamp", "FCB2C01", 
"RCB2C01", "FCB2C02", "RCB2C02", "FCB2C03", "RCB2C03", "FCB2C04", 
"RCB2C04", "FCB2C05", "RCB2C05", "FCB2C06", "RCB2C06", "FCB2C07", 
"RCB2C07", "FCB2C08", "RCB2C08", "FCB2C09", "RCB2C09", "FCB2C10", 
"RCB2C10"), row.names = c(NA, 10L), class = "data.frame")

列索引:

    idx <- seq(2,21,2)

1 个答案:

答案 0 :(得分:4)

如果我正确理解你想做什么,“正确”的方法是使用levels功能来指定你的等级。比较以下内容:

set.seed(1)
x <- sample(-2:4, 10, replace = TRUE)

YourApproach <- factor(x, levels = -2:4, labels = c(1, 1, 0, 1, 1, 1, 0))
# Warning message:
# In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
#   duplicated levels in factors are deprecated
YourApproach
#  [1] 1 0 1 0 1 0 0 1 1 1
# Levels: 1 1 0 1 1 1 0

xFac <- factor(x, levels = -2:4)
levels(xFac) <- list(`0` = c(0, 4), `1` = c(-2, -1, 1, 2, 3))
xFac
#  [1] 1 0 1 0 1 0 0 1 1 1
# Levels: 0 1

注意每个中“级别”的差异。这也意味着底层的数字表示将是不同的:

> as.numeric(YourApproach)
 [1] 2 3 5 7 2 7 7 5 5 1
> as.numeric(xFac)
 [1] 2 1 2 1 2 1 1 2 2 2
相关问题