提取特定词,然后在R中提取另一个词

时间:2019-07-16 06:30:15

标签: r regex dataframe

我有一个空缺职位的描述。我想对他们进行评分,然后将其发布在相邻的栏中。可以通过提取文字说明中“成绩:”旁边的单词来完成

模拟

  structure(list(description = structure(2:1, .Label = c("Grade: L3 Position title bla bla bla", 
"Head of xxxxxxxx Grade: L5 Last Date to Apply: 22nd July 2019"
), class = "factor"), division = structure(2:1, .Label = c("ABC", 
"XYZ"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

请求的结果

Description     Division     Grade
sdsdsdsd         XYZ          L5
asdasdsadas      ABC          L3

我找到了这个解决方案,它可以解决这个问题,但不能放在一栏中。

Extract text that follows a specific word/s in R

2 个答案:

答案 0 :(得分:3)

您可以使用sub并提取"Grade"之前和之后带有0或多个空格的:之后的单词

sub(".*Grade\\s*:\\s*(\\w+).*", "\\1", df$description)
#[1] "L5" "L3"

答案 1 :(得分:2)

您可以像这样使用stringer包:

library(stringr)
df[,"Grade"] <- sub("Grade: ", "", str_extract(df$description, "Grade: [^ ]+"))

数据:

df <- structure(list(description = structure(2:1, .Label = c("Grade: L3 Position title bla bla bla", 
                                                       "Head of xxxxxxxx Grade: L5 Last Date to Apply: 22nd July 2019"
), class = "factor"), division = structure(2:1, .Label = c("ABC", 
                                                           "XYZ"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                           -2L))

编辑: 我刚刚看到评论中有更好的答案。因此最好使用其中一个,因为它们不依赖额外的程序包。