如何根据特定条件提取字符串

时间:2018-05-27 19:37:51

标签: r dplyr

我想知道如何根据不同条件替换字符串,然后将dplyr与数据集中的字符串组合在一起。
例如,

Discription on how I want to extract from the given dataset

我对FRAUD和NARC的处理方式不同的原因是我认为NARC-SELL和NARC-POSSES之间存在差异(所涉及的药物种类并不重要)。 谢谢你的帮助!

2 个答案:

答案 0 :(得分:3)

您需要使用NARC-[A-Z]*|FRAUD之类的正则表达式字符串:NARC后跟一个短划线后跟一串大写字母,或FRAUD

library(dplyr)
d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))
d %>%
  mutate(y = gsub("^(NARC-[A-Z]+|FRAUD).*", "\\1",  x))
#                          x                 y
# 1        FRAUD-CREDIT CARD             FRAUD
# 2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
# 3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
# 4         NARC-SELL-HEROIN         NARC-SELL

答案 1 :(得分:0)

您还可以使用stringr中的str_extract()

# using Weihuang Wong's nice example data

library(dplyr)
library(stringr)

d <- data.frame(x = c("FRAUD-CREDIT CARD",
                      "HOMICIDE-JUST-GUN",
                      "NARC-POSSESS-PILL/TABLET",
                      "NARC-SELL-HEROIN"))

pattern <- "^(NARC-\\w+|FRAUD|HOMICIDE-\\w+-\\w+)"

d %>% mutate(y = str_extract(x, pattern))

                         x                 y
1        FRAUD-CREDIT CARD             FRAUD
2        HOMICIDE-JUST-GUN HOMICIDE-JUST-GUN
3 NARC-POSSESS-PILL/TABLET      NARC-POSSESS
4         NARC-SELL-HEROIN         NARC-SELL