删除包含匹配数字字符串的行

时间:2017-12-24 19:31:23

标签: r dplyr gsub

我有一个包含3列的数据框:

df

A             B               C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5
.
.
.
.
.
round100  test30       testing30
round100  test30       testing31

如何删除列BC的字符串中的数值匹配的行?

2 个答案:

答案 0 :(得分:2)

只需提取数字部分并进行比较即可。

NumB = sub("\\D+(\\d+).*", "\\1", DAT$B)
NumC = sub("\\D+(\\d+).*", "\\1", DAT$C)
DAT = DAT[NumB != NumC,]

DATA

DAT = read.table(text="A       B     C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5",
header=TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:2)

用空字符串替换非数字"\\D"并比较剩下的内容:

subset(DF, gsub("\\D", "", B) != gsub("\\D", "", C))

在下面的注释:

中给出输入DF可重复显示的位置
          A      B         C
2    round1  test1  testing2
3    round1  test1  testing3
4    round1  test1  testing4
5    round1  test1  testing5
6    round2  test2  testing1
8    round2  test2  testing3
9    round2  test2  testing4
10   round2  test2  testing5
12 round100 test30 testing31

注意

可重复形式的输入是:

Lines <- "
A             B               C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5
round100  test30       testing30
round100  test30       testing31"
DF <- read.table(text = Lines, header = TRUE)