字符串中的多个单词频率计数

时间:2017-10-11 16:59:07

标签: r grep text-mining

我有一个文本文件,想要两组单词的频率计数。例如:

mock-maker-inline

以下列方式要求输出:

setone <- ("mumbai", "delhi", "chennai")

settwo <- ("nike", "zara","puma")

textfile <- ("brands in cites like nike zara and puma in mumbai, delhi and chennai. while many exotic brands in mumbai... disel, durby, Calvin Kline")

请帮忙。

1 个答案:

答案 0 :(得分:1)

这是一种方法:

library(tidyverse)
library(stringr)

setone <- c("mumbai", "delhi", "chennai")

settwo <- c("nike", "zara","puma")

textfile <- (
  "brands in cites like nike zara and puma in mumbai, delhi and chennai. 
  while many exotic brands in mumbai... disel, durby, Calvin Kline")

out <- tibble(
  textfile = textfile,
  setone = str_count(textfile, str_c(setone, collapse = '|')),
  settwo = str_count(textfile, str_c(settwo, collapse = '|'))
)
out <- mutate(out, total = setone + settwo)