如何根据某些搜索字符串创建正则表达式到子集数据框?

时间:2016-05-02 14:04:01

标签: regex r grep

我正在尝试搜索字符串以对数据帧进行子集化。我的df看起来像这样:

dput(df)
structure(list(Cause = structure(c(2L, 1L), .Label = c("jasper not able to read the property table after the release", 
"More than 7000  messages loaded which stuck up"), class = "factor"), 
    Resolution = structure(1:2, .Label = c("jobs and reports are processed", 
    "Updated the property table which resolved the issue."), class = "factor")), .Names = c("Cause", 
"Resolution"), class = "data.frame", row.names = c(NA, -2L))

我正在尝试这样做:

df1<-subset(df, grepl("*MQ*|*queue*|*Queue*", df$Cause))

在“原因”列中搜索MQ或队列或队列,使用匹配的记录对数据帧df进行子集化。它似乎没有工作,它捕获其他记录,MQ,队列或队列字符串不存在。

这是你怎么做的,我可以遵循的任何其他想法吗?

2 个答案:

答案 0 :(得分:6)

下面的正则表达似乎有效。我在data.frame添加了一行,这是一个更有趣的例子。

我认为问题来自你的正则表达式中的*,还添加了大括号来定义|的组,但不认为这是强制性的。

df <- data.frame(Cause=c("jasper not able to read the property table after the release", 
                         "More than 7000  messages loaded which stuck up",
                         "blabla Queue blabla"),
                 Resolution = c("jobs and reports are processed", 
                                "Updated the property table which resolved the issue.",
                                "hop"))

> head(df)
Cause                                           Resolution
1 jasper not able to read the property table after the release                       jobs and reports are processed
2               More than 7000  messages loaded which stuck up Updated the property table which resolved the issue.
3                                          blabla Queue blabla                                                  hop

> subset(df, grepl("(MQ)|(queue)|(Queue)", df$Cause))
Cause Resolution
3 blabla Queue blabla        hop

这是你想要的吗?

答案 1 :(得分:1)

从评论中转移:

subset(df, grepl("MQ|Queue|queue", Cause))

或者如果任何情况可以接受,那么:

subset(df, grepl("mq|queue", Cause, ignore.case = TRUE))

要获取更多信息,请在R。

中尝试?regex?grepl