在R中单独合并和更改NA

时间:2016-09-01 04:14:47

标签: r dataframe

我的目标是将2个数据集合并在一起但是我需要跟踪“NA”,当我合并df1和df2时我需要“NA”,而df1的“NA”分配类似于9999.问题是我的数据有这么多变量,它包括日期,数字,字符......,所以当我尝试通过df1[is.na(df1)] <- 9999设置df1的“NA”时,它只能用数字,我可以用任何方法来解决问题不同的NA分开。

df1 <- data.frame(ID= c(1:10), 
              Value=c(3,NA,7,2:8),
              Group = c("A",NA,"C","D",NA,"B",NA,"C","D",NA))

df2 <- data.frame(ID = c(5:14),Count =c(1:9,NA),
                  School = c("A",NA,"C","D",NA,"B","NA","C","D",NA))

df1[is.na(df1)] <- 9999

data <- merge(df1,df2,all = TRUE,by= "ID")

   ID Value Group Count School
1   1     3     A    NA   <NA>
2   2  9999  <NA>*   NA   <NA>
3   3     7     C    NA   <NA>
4   4     2     D    NA   <NA>
5   5     3  <NA>*    1      A
6   6     4     B     2   <NA>
7   7     5   <NA>*    3      C
8   8     6     C     4      D
9   9     7     D     5   <NA>
10 10     8  <NA>*     6      B
11 11    NA  <NA>     7     NA
12 12    NA  <NA>     8      C
13 13    NA  <NA>     9      D
14 14    NA  <NA>    NA   <NA>

*假设为9999

2 个答案:

答案 0 :(得分:1)

您可以在执行合并之前尝试替换NA中的df1$Group值:

df1$Group <- as.character(df1$Group)
df1$Group[is.na(df1$Group)] <- 9999

但我觉得你已经知道了这一点,但因为df1$Group是一个因素而被抛弃了,这意味着除非你首先使用{{1}投射,否则上述代码不会按预期工作}。您也可以在合并后执行此替换。

答案 1 :(得分:1)

I'd like to contribute a bit more to this question. If you have, say, 100 columns in various classes and try to replace all NAs, you could try the following. The idea is that you convert all columns to character and replace all NAs with 9999. Then, you want to convert the classes of the columns back to the original classes. Finally, you merge df1 and df2.

library(dplyr)

# Save original classes.
original <- unlist(lapply(df1, class))

# Convert all columns to character and replace NAs with 9999
mutate_all(df1, as.character) %>%
mutate_each(funs(recode(., .missing = "9999"))) -> df1

# http://stackoverflow.com/questions/7680959/convert-type-of-multiple-columns-of-a-dataframe-at-once
# Credit to joran for this function.

convert.magic <- function(obj,types){

for (i in 1:length(obj)){
    FUN <- switch(types[i],character = as.character, 
                           numeric = as.numeric, 
                           factor = as.factor,
                           integer = as.integer,
                           logical = as.logical)
        obj[,i] <- FUN(obj[,i])
    }
    obj
}

out <- convert.magic(df1, original) %>%
       full_join(df2, by = "ID")

out

#   ID Value Group Count School
#1   1     3     A    NA   <NA>
#2   2  9999  9999    NA   <NA>
#3   3     7     C    NA   <NA>
#4   4     2     D    NA   <NA>
#5   5     3  9999     1      A
#6   6     4     B     2   <NA>
#7   7     5  9999     3      C
#8   8     6     C     4      D
#9   9     7     D     5   <NA>
#10 10     8  9999     6      B
#11 11    NA  <NA>     7     NA
#12 12    NA  <NA>     8      C
#13 13    NA  <NA>     9      D
#14 14    NA  <NA>    NA   <NA>
相关问题