使用ifelse修改因子变量中的级别

时间:2018-03-14 18:37:20

标签: r tidyverse r-factor

当我遇到这种奇怪的情况时,我希望通过将两个级别分组为一个来修改我的因子变量中的级别。基本上,我的新级别已创建,但所有剩余级别似乎都移到了下一级。这是我的示例数据,使用的代码和输出。

library(tidyverse) 
data <- structure(list(factor1 = structure(c(1L, 1L, 2L, 3L, 1L, 2L, 
        1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L, 1L, 1L, 3L, 1L, 1L, 1L, 4L), .Label = c("0", "1", "2", "3", 
        "4", "5", "6", "7"), class = "factor")), row.names = c(NA, -30L
        ), class = c("tbl_df", "tbl", "data.frame"), .Names = "factor1")
data_out <- data %>% mutate(factor1 = ifelse(factor1 %in% c('0', '1'), 
                                             factor1, '>1'))
structure(list(factor1 = c("1", "1", "2", ">1", "1", "2", "1", 
"1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", ">1", "1", "1", "1", ">1")), .Names = "factor1", 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L))

这是理想的行为吗?这当然不是我的情况。如何解释然后纠正?

2 个答案:

答案 0 :(得分:3)

我猜这个问题围绕着构建因素的方式。如何将{&#34; 0&#34;,&#34; 1&#34;}的级别转换为{&#34; 1&#34;,&#34; 2&#34;,&# 34;&gt;&gt; 1&#34;}通过mutate仍然不清楚。

R因子实际上是base-1整数向量,其属性是它们的级别。所以你的&#34; 0&#34;最初的水平实际上是整数1和你的&#34; 1&#34;水平是整数-2。显然,mutate函数适合创建一个新因子,其附加级别打印为&#34;&gt; 1&#34;但也重新分配了&#34; 0&#34;升级到新的&#34; 1&#34; -level和&#34; 1&#34;等级为&#34; 2&#34; -level。这看起来像mutate对我来说是一种危险的行为。我认为应该给你一个新的因素,包括等级&#34; 0&#34;,&#34; 1&#34;,&#34;&gt; 1&#34;或者它应该抛出一个错误。

错误来自ifelse,尽管mutate通过将新列也纳入一个因素来解决问题。如果您将data强制转换为数据框,则会看到:

data$factor2 <- ifelse( data$factor1 %in% c('0', '1'), 
                                              data$factor1, '>1')
data
#-------- same issue except
   factor1 factor2
1        0       1
2        0       1
3        1       2
4        2      >1
.... delete the other 26 rows
> str(data)
'data.frame':   30 obs. of  2 variables:
 $ factor1: Factor w/ 8 levels "0","1","2","3",..: 1 1 2 3 1 2 1 1 2 2 ...
 $ factor2: chr  "1" "1" "2" ">1" ...

这可以让你留在dplyr包中:

recode_factor(data$factor1, `0` = "0", `1` = "1", .default=">1")
 [1] 0  0  1  >1 0  1  0  0  1  1  1  1  1  0  1  0  0  0  0  0  0  0  0  0  0  >1 0  0  0  >1
Levels: 0 1 >1

答案 1 :(得分:3)

如果有人在将来遇到类似问题并且正在寻找一种简单的方法来分组这些因素而不重新分配剩余的一个:

fct_collapse(data$factor1, '>1' = c('2', '3')) 
相关问题