有条件地用其他行的值替换 NA

时间:2021-02-08 15:44:41

标签: r tidyverse

我得到了一个大数据集,其中一个变量中有一组相对较大的缺失变量值。但由于我知道变量取决于时间和空间方面,我可以通过从另一行中获取具有精确匹配的时间 和空间值的值来轻松估算缺失值。假设生成的数据如下:

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

提供以下数据框:

    temporal spatial value
1     Monday   North    NA
2     Monday   South     2
3    Tuesday   North     3
4    Tuesday   South     4
5  Wednesday   North     5
6  Wednesday   South     6
7   Thursday   North     7
8   Thursday   South    NA
9     Friday   North     9
10    Friday   South    10
11    Monday   North     1
12    Monday   South    NA
13   Tuesday   North     3
14   Tuesday   South     4
15 Wednesday   North     5
16 Wednesday   South     6
17  Thursday   North     7
18  Thursday   South     8
19    Friday   North     9
20    Friday   South    NA

在这种情况下,我想将 value == NA 替换为在 valuespatial 上具有匹配值的另一行中的 temporal

因此,最终结果应如下所示:

    temporal spatial value
1     Monday   North     1
2     Monday   South     2
3    Tuesday   North     3
4    Tuesday   South     4
5  Wednesday   North     5
6  Wednesday   South     6
7   Thursday   North     7
8   Thursday   South     8
9     Friday   North     9
10    Friday   South    10
11    Monday   North     1
12    Monday   South     2
13   Tuesday   North     3
14   Tuesday   South     4
15 Wednesday   North     5
16 Wednesday   South     6
17  Thursday   North     7
18  Thursday   South     8
19    Friday   North     9
20    Friday   South    10

我尝试通过在 group_by 中使用 tidyverse 函数来做到这一点:

library(tidyverse)
df <- df %>%
  group_by(temporal, spatial) %>%
  mutate(value, unique(value[is.na(value)]))

但我收到以下错误消息:

Error: Problem with `mutate()` input `..2`.
x Input `..2` can't be recycled to size 2.
i Input `..2` is `unique(value[is.na(value)])`.
i Input `..2` must be size 2 or 1, not 0.
i The error occurred in group 1: temporal = "Friday", spatial = "North"

我是否以正确的方式处理这个问题?如果是,为什么我的代码不能像(我相信)那样工作?如果不是,什么方法是合适的?

谢谢! :)

2 个答案:

答案 0 :(得分:1)

这是一个 dplyr 方法。我们按 temporalspatial 分组,然后按 temporalspatialvalue 排列,因为 NA 值将自动置于任何非NA 值。然后我们使用 mutate 根据 value 第一行中的数字创建 value

library(dplyr)

df %>%
  group_by(temporal, spatial) %>% 
  arrange(temporal, spatial, value) %>% 
  mutate(value = value[1])

使用 tidyr::fill 的更简洁方法,保留行的结构:

library(tidyverse)

df %>%
  group_by(temporal, spatial) %>% 
  fill(value, .direction = "downup")

# A tibble: 20 x 3
# Groups:   temporal, spatial [10]
   temporal  spatial value
   <chr>     <chr>   <chr>
 1 Monday    North   1    
 2 Monday    South   2    
 3 Tuesday   North   3    
 4 Tuesday   South   4    
 5 Wednesday North   5    
 6 Wednesday South   6    
 7 Thursday  North   7    
 8 Thursday  South   8    
 9 Friday    North   9    
10 Friday    South   10   
11 Monday    North   1    
12 Monday    South   2    
13 Tuesday   North   3    
14 Tuesday   South   4    
15 Wednesday North   5    
16 Wednesday South   6    
17 Thursday  North   7    
18 Thursday  South   8    
19 Friday    North   9    
20 Friday    South   10   

答案 1 :(得分:1)

您的 mutate 将不起作用,因为您没有为变量分配任何值。您的 mutate() 应如下所示 mutate(value = unique(value[is.na(value)]))。虽然这不是我的方法。我在下面所做的是创建一个不同的非 NA 值的查找表,然后将它们连接到原始数据集上。 valuedis 应该是你想要的值。

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

library(dplyr)


dfdis <- df %>% 
          filter(!is.na(value)) %>% 
          distinct(temporal,spatial,value) %>% 
          rename(valuedis = value)

df2 <- left_join(df,dfdis, by = c("temporal","spatial"))
相关问题