如何在R数据帧的两列中匹配相似的值

时间:2018-07-17 09:14:51

标签: r dataframe dplyr

我有以下提到的数据框:

ID       Value1          Value2
AAA-01   Ert we          ert-We
AAA-02   ATT ER          ATT ER
AAA-03   Accept          accepted
AAA-04   Apple           Apple
AAA-05   VEETR           veetr
AAA-06   EERTT           RRFTF
AAA-07   ETYuU           RTTRR

通过使用上述数据框,我想匹配外观相似的文本,并为其赋予TRUEFALSE值。

输出:

ID       Value1          Value2     Status
AAA-01   Ert we          ert-We     TRUE
AAA-02   ATT ER          ATT ER     TRUE
AAA-03   Accept          accepted   TRUE
AAA-04   Apple           Apple      TRUE
AAA-05   VEETR           veetr      TRUE
AAA-06   EERTT           RRFTF      FALSE
AAA-07   ETYuU           RTTRR      FALSE

2 个答案:

答案 0 :(得分:1)

下面是一种可能的方法。不知道这是否可以满足您在本例之外的“外观相似的文字”标准,但这可能会让您有所收获。

df = read.table(text="ID       Value1          Value2
AAA-01   Ert_we          ert-We
AAA-02   ATT_ER          ATT_ER
AAA-03   Accept          accepted
AAA-04   Apple           Apple
AAA-05   VEETR           veetr
AAA-06   EERTT           RRFTF
AAA-07   ETYuU           RTTRR",header=T)

Value1_txt = tolower(gsub('[^[:alpha:] ]','',df$Value1))
Value2_txt = tolower(gsub('[^[:alpha:] ]','',df$Value2))
df$similar = mapply(function(x,y) grepl(x,y) | grepl(y,x) ,Value1_txt,Value2_txt)

输出:

      ID Value1   Value2 similar
1 AAA-01 Ert_we   ert-We    TRUE
2 AAA-02 ATT_ER   ATT_ER    TRUE
3 AAA-03 Accept accepted    TRUE
4 AAA-04  Apple    Apple    TRUE
5 AAA-05  VEETR    veetr    TRUE
6 AAA-06  EERTT    RRFTF   FALSE
7 AAA-07  ETYuU    RTTRR   FALSE

答案 1 :(得分:0)

在此示例中,假设“外观相似的文本” 表示转换为小写字母后的前三个字符是相同的

match (r:Reply)--(n:TRANS)
return split(toInteger(n.content), " ")

位置:

df$Status <- with(
  df, 
  tolower(substr(Value1, 1, 3)) == tolower(substr(Value2, 1, 3))
)
df
      ID Value1   Value2 Status
1 AAA-01 Ert we   ert-We   TRUE
2 AAA-02 ATT ER   ATT ER   TRUE
3 AAA-03 Accept accepted   TRUE
4 AAA-04  Apple    Apple   TRUE
5 AAA-05  VEETR    veetr   TRUE
6 AAA-06  EERTT    RRFTF  FALSE
7 AAA-07  ETYuU    RTTRR  FALSE