检查一个表的字符串是否全部包含在另一表中

时间:2019-01-06 17:15:19

标签: r

我有两个表T1(3列)和T2(2列)

T1:

Name  Age  Num
John  20   a, c, b
Lily  19   d, h, e

T2:

Item    Num
pen     a, c, q, b
pencil  d, z, h, e
apple   a, c, y

列代码为字符串格式。 我想检查T1 $ Num是否所有数字都在T2 $ Num中,并将相应的T2 $ Item添加到T1。 代码类似于

 T1 <- sqldf("SELECT *, T2.Item FROM T1 LEFT JOIN T2 WHERE T1.Num are all contained in T2.Num")

我应该得到

Name  Age  Num         Item
John  20   a, c, b     pen
Lily  19   d, h, e     pencil

谢谢您的帮助!

1 个答案:

答案 0 :(得分:0)

1)使用结尾处“注释”中可重复显示的输入,并假设Num和{{1 }}(问题中的数据就是这种情况–我们稍后放松这个假设),我们可以使用T1T2转换为replace模式,然后将其与T1.Num执行左联接。

like

给予:

T2.Num

如果不是library(sqldf) sqldf("select T1.*, T2.Item, T2.Num Num2 from T1 left join T2 on T2.Num like '%' || replace(T1.Num, ', ', '%') || '%'") Name Age Num Item Num2 1 John 20 a, c, b pen a, c, q, b 2 Lily 19 d, h, e pencil d, z, h, e 3 Jake 10 a, d <NA> <NA> Num的组件以相同的方式排序,则首先将它们排序如下:

T1

2):该替代方法使用了不带sqldf的dplyr和tidyr。

T2

给予:

library(dplyr)
library(tidyr)

T1x <- T1 %>%
  separate_rows(Num) %>%
  arrange(Name, Num) %>%
  group_by(Name) %>%
  summarize(Num = toString(Num)) %>%
  ungroup

T2x <- T2 %>%
  separate_rows(Num) %>%
  arrange(Item, Num) %>%
  group_by(Item) %>%
  summarize(Num = toString(Num)) %>%
  ungroup

sqldf("select T1x.*, T2x.Item, T2x.Num Num2 from T1x 
       left join T2x on T2x.Num like '%' || replace(T1x.Num, ', ', '%') || '%'")

注意

可重复输入的形式是:

T1Long <- T1 %>%
  separate_rows(Num)

T1Long %>%
  left_join(T1Long %>% count(Name), by = "Name") %>%
  left_join(T2 %>% separate_rows(Num), by = "Num") %>%
  group_by(Name, Item, n) %>%
  summarize(Num = toString(Num), Count = n()) %>%
  ungroup %>%
  filter(Count == n) %>%
  select(-Count, -n)