根据文本列添加列出的关键字(字符串)列

时间:2018-01-29 14:18:46

标签: r string dataframe

如果我有一个包含以下列的数据框:

df$text <- c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example")

和字符串如此:

keywords <- c("not that long", "This string", "example", "helps")

我正在尝试向我的数据框添加一列,其中包含每行文本中存在的关键字列表:

DF $关键字:

1 c("This string","not that long")    
2 c("This string","not that long")    
3 c("helps","example")

虽然我不确定如何1)从文本列中提取匹配的单词,2)然后如何在新列的每一行中列出匹配单词

2 个答案:

答案 0 :(得分:3)

也许是这样的:

df = data.frame(text=c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example"))
keywords <- c("not that long", "This string", "example", "helps")

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords,grepl,x)]})

输出:

                                                 text                   keywords
1                        This string is not that long not that long, This string
2 This string is a bit longer but still not that long not that long, This string
3                This one just helps with the example             example, helps

外部lapply循环df$text,内部lapply检查keywords的每个元素(如果它位于df$text元素中)。所以稍微长一点但也许更容易阅读的等价物是:

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords, function(y){grepl(y,x)})]})

希望这有帮助!

答案 1 :(得分:2)

我们可以使用str_extract

中的stringr进行提取
library(stringr)
df$keywords <- str_extract_all(df$text, paste(keywords, collapse = "|"))
df
#                                                text                   keywords
#1                        This string is not that long This string, not that long
#2 This string is a bit longer but still not that long This string, not that long
#3                This one just helps with the example             helps, example

或链中

library(dplyr)
df %>%
   mutate(keywords = str_extract_all(text, paste(keywords, collapse = "|")))
相关问题