因子替换转换为R

时间:2018-03-25 15:20:09

标签: r sapply

我想将所选列中的NA替换为列级别中的最后一个值,但它会不断将列转换为字符:

table(sapply(cop2014, class))

factor   numeric
400      116

varToCat = c("V21A","A3","Escolari","A17","B8","C5B","RamaEmpPri","C11","C16B",
         "C16C","D4B","D4C","RamaEmpSec","RamaUltEmpCesant","G12",
         "RamaFuerzaTrab","OcupFuerzaTrab","ActNoMer")

cop2014[,varToCat] = sapply(cop2014[,varToCat], 
        function(col) replace(col, is.na(col), last(levels(col))))

当我看看变量的类时,我可以看到它们发生了变化。

table(sapply(cop2014, class))

character   factor   numeric
18          382      116

有关为何发生这种情况的任何提示?我只想用有效因子替换NA(在这种情况下是级别上的最后一个)

1 个答案:

答案 0 :(得分:1)

转化为matrixsapply,而matrix只能容纳一个类。因此,请使用sapply

而不是lapply
df1[] <- lapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
str(df1)
#'data.frame':   10 obs. of  2 variables:
#$ v1: Factor w/ 3 levels "B","D","E": 1 1 3 2 2 3 1 3 3 1
#$ v2: Factor w/ 5 levels "A","B","C","D",..: 4 3 5 5 2 5 2 1 4 1

如果我们查看sapply的输出,它是matrix,它只能容纳一个类。在转换为matrix期间,factor的属性将丢失,并转换为character

sapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
#      v1  v2 
# [1,] "B" "D"
# [2,] "B" "C"
# [3,] "E" "E"
# [4,] "D" "E"
# [5,] "D" "B"
# [6,] "E" "E"
# [7,] "B" "B"
# [8,] "E" "A"
# [9,] "E" "D"
#[10,] "B" "A"

除了lapply之外,我们还可以使用mutate_at中的tidyverse

library(dplyr)  
cop2014 %>%
  mutate_at(vars(varToCat), funs(replace(., is.na(.), last(levels(.)))))

数据

f1 <- function(n) sample(c(LETTERS[1:5], NA), n, replace = TRUE)
set.seed(24)
df1 <- data.frame(v1 = f1(10), v2 = f1(10))
相关问题