Question

目前我已经注册了R课程，其中一个练习练习是构建一个R程序来计算字符串中的单词。我们不能使用函数table，但必须使用常规方法返回字符串中最常用单词的输出。即狐狸跳过锥体...... 所以程序必须返回“the”，因为它是最流行的短语。

到目前为止，我有以下内容：

string_read<- function(phrase) {

  phrase <- strsplit(phrase, " ")[[1]]
  for (i in 1:length(phrase)){
    phrase.freq <- ....
#if Word already exists then increase counter by 1

      }

我遇到了障碍，但我不知道如何增加特定单词的计数器。任何人都可以给我指向正确的方向吗？我的伪代码类似于：“对于循环的每个单词，将wordIndex增加1.如果之前已经出现过单词，请增加wordIndex计数器。”

Answer 1

您通过将字符串拆分为单词来正确启动，然后我们使用sapply遍历每个单词并对向量中的相似单词求和。我使用tolower假设此操作不区分大小写。

string_read<- function(phrase) {
   temp = tolower(unlist(strsplit(phase, " ")))
   which.max(sapply(temp, function(x) sum(x == temp)))
}

phrase <- "The fox jumped over the cone and the"

string_read(phrase)
#the 
#  1

这将返回输出作为单词及其索引位置，在这种情况下为1。如果您只想要具有最大计数的单词，则可以将最后一行更改为

temp[which.max(sapply(temp, function(x) sum(x == temp)))]

Answer 2

我们可以使用str_extract

执行此操作

library(stringr)
string_read<- function(str1) {
  temp <- tolower(unlist(str_extract_all(str1, "\\w+")))
  which.max(sapply(temp, function(x) sum(x == temp)))
}

phrase <- "The fox jumped over the cone and the"
string_read(phrase)
#the 
#  1 
phrase2 <- "The fox jumped over the cone and the fox, fox, fox, fox, fox"
string_read(phrase)
#fox 
# 2

在向量中计数单词

2 个答案: