R从模式结尾提取子串直到第一次出现字符

时间:2014-03-15 19:56:12

标签: regex r gsub

努力争取这场比赛并在R gsub中取代工作仍然没有成功。 我正在尝试匹配字符串中的模式"Reason:",并在此模式之后接触所有内容,直到第一次出现点(.) 例如:

Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.

会返回"Not interested"

4 个答案:

答案 0 :(得分:5)

这是一个解决方案:

s <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."

sub(".*Reason: (.*?)\\..*", "\\1", s)
# [1] "Not interested"

更新(以发表评论):

如果您还有与该模式不匹配的字符串,建议您使用regexpr代替sub

s2 <- c("no match example",
        "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE.")

match <- regexpr("(?<=Reason: ).*?(?=\\.)", s2, perl = TRUE)
ifelse(match == -1, NA, regmatches(s2, match))
# [1] NA                                "Not interested. ChannelID: CARE"

对于第二个示例,您可以使用以下正则表达式:

s3 <- "Delete Payment Arrangement of type Proof of Payment for BAN : 907295267 on date 02/01/2014, from reason PAERR."

# a)
sub(".*type (.*?) for.*", "\\1", s3)
# [1] "Proof of Payment"

# b)
match <- regexpr("(?<=type ).*?(?= for)", s3, perl = TRUE)
ifelse(match == -1, NA, regmatches(s3, match))
# [1] "Proof of Payment"

答案 1 :(得分:2)

许多不同的方式(从提交中可以看出)。我个人喜欢使用stringr函数。

library(stringr)

rec <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
str_match(rec, "Reason: ([a-zA-Z0-9\ ]+)\\.")[2]
## [1] "Not interested"

答案 2 :(得分:0)

这将有效:

x <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."

library(qdap)
genXtract(x, "Reason:", ".")

##     Reason:  :  . 
## " Not interested" 

答案 3 :(得分:0)

使用regexepr和regmatches:

str <- "Offer Disposition. MSISDN: 7183067962. Offer: . Disposition: DECLINED. Reason: Not interested. ChannelID: CARE."
m<-regexpr("(?<=Reason: )[^.]+", str, perl=TRUE)
regmatches(str, m)