有条件地用列名称查找表中的值替换列中的值

时间:2018-08-14 20:12:35

标签: r

我正在处理两个表:

t1<-data.frame(Name=c("Waldo","Mark","Harold","Earl"),Number=c(1,4,3,9))

t2<-data.frame(Whatever=c("does","not","really","matter","at","all"),Waldo=c(0,1,1,0,0,1),Mark=c(1,0,1,1,0,0),Harold=c(0,1,0,0,0,0),Earl=c(1,1,1,1,0,0),Extra=c("another","column","appearing","in","this","table"))

我想用t2中的查找值替换t1中的1。 t2的列名称在t1中显示为记录。 t2中的所有0值应保持不变。

在我的真实数据中,t2中有数百列,t1中有数百行。

t2中还有几列不受此编码影响,但应保留在最终输出中。

是否有编码的最佳实践?

该示例的所需输出如下:

Whatever   Waldo  Mark  Harold  Earl  Extra
does       0      4     0       9     another
not        1      0     3       9     column
really     1      4     0       9     appearing
matter     0      4     0       9     in
at         0      0     0       0     this
all        1      0     0       0     table

提前谢谢!

2 个答案:

答案 0 :(得分:1)

这对于您的实际数据集应该足够灵活:

my_function <- function(df, lookup) {
  for(i in names(df)) {
    df[[as.character(i)]][df[[as.character(i)]] == 1] <- lookup$Number[lookup$Name == as.character(i)]
  }
  return(df)
}

my_function(t2, t1)
#   Whatever Waldo Mark Harold Earl     Extra
# 1     does     0    4      0    9   another
# 2      not     1    0      3    9    column
# 3   really     1    4      0    9 appearing
# 4   matter     0    4      0    9        in
# 5       at     0    0      0    0      this
# 6      all     1    0      0    0     table

答案 1 :(得分:1)

这是一个tidyverse工作流程,在这个示例中可能有点多余,但是对于较大的数据集应该可以很好地扩展。我将其分为几步,以免从宽数据到再到长数据再复杂不过了:

首先,我将t2重塑为长格式,并过滤以1:进行观察。

library(tidyverse)

t2 %>%
  gather(key = Name, value = value, -Whatever, -Extra) %>%
  filter(value == 1)
#>    Whatever     Extra   Name value
#> 1       not    column  Waldo     1
#> 2    really appearing  Waldo     1
#> 3       all     table  Waldo     1
#> 4      does   another   Mark     1
#> 5    really appearing   Mark     1
#> 6    matter        in   Mark     1
#> 7       not    column Harold     1
#> 8      does   another   Earl     1
#> 9       not    column   Earl     1
#> 10   really appearing   Earl     1
#> 11   matter        in   Earl     1

然后,我与t1一起使用left_join,以防t2中的任何观测值与t1中的值不匹配。这使我从Number中获得了t1列,因此现在我可以从收集中删除value列:

t2 %>%
  gather(key = Name, value = value, -Whatever, -Extra) %>%
  filter(value == 1) %>%
  left_join(t1, by = "Name") %>%
  select(-value)
#>    Whatever     Extra   Name Number
#> 1       not    column  Waldo      1
#> 2    really appearing  Waldo      1
#> 3       all     table  Waldo      1
#> 4      does   another   Mark      4
#> 5    really appearing   Mark      4
#> 6    matter        in   Mark      4
#> 7       not    column Harold      3
#> 8      does   another   Earl      9
#> 9       not    column   Earl      9
#> 10   really appearing   Earl      9
#> 11   matter        in   Earl      9

然后,我使用spread将其恢复为宽格式。请注意,这些函数会创建要对其进行排序的因子,因此,最后的扩展列将按字母顺序排列。如果需要,可以使用select更改列的顺序。

从头到尾的过程:

t2 %>%
  gather(key = Name, value = value, -Whatever, -Extra) %>%
  filter(value == 1) %>%
  left_join(t1, by = "Name") %>%
  select(-value) %>%
  spread(key = Name, value = Number, fill = 0)
#>   Whatever     Extra Earl Harold Mark Waldo
#> 1      all     table    0      0    0     1
#> 2     does   another    9      0    4     0
#> 3   matter        in    9      0    4     0
#> 4      not    column    9      3    0     1
#> 5   really appearing    9      0    4     1

reprex package(v0.2.0)于2018-08-14创建。