Question

我有很多关键字需要与更大的文档集进行比较并计算出现次数。

由于计算需要数小时，我决定尝试并行处理。在这个论坛上，我找到了并行包的mclapply函数，这似乎很有帮助。

作为R的新手，我无法使代码正常工作（请参阅下面的简短版本）。更具体地说，我得到了错误：

“get中的错误（as.character（FUN），mode =”function“，envir = envir）：没有找到“'功能'模式的对象'FUN'”

rm(list=ls())

df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km")
keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159")

countstrings <- function(x){str_count(x, paste(sprintf("\\b%s\\b", keywords), collapse = '|'))}

# Normal way with one processor
number_of_keywords <- countstrings(df)
# Result: [1] 3 2 2

# Attempt at parallel processing
library(stringr)
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
number_of_keywords <- mclapply(cl, countstrings(df))
stopCluster(cl)
#Error in get(as.character(FUN), mode = "function", envir = envir) : 
#object 'FUN' of mode 'function' was not found

任何帮助都是适用的！

Answer 1

此功能应该更快。这是使用parSapply使用并行处理的另一种方法（这会返回一个向量而不是列表）：

# function to count
count_strings <- function(x, words)
{
    sum(unlist(strsplit(x, ' ')) %in% words)
}

library(stringr)
library(parallel)
mcluster <- makecluster(detectCores()) # using all cores

number_of_keywords <- parSapply(mcluster, df, count_strings, keywords, USE.NAMES=F)

[1] 3 2 2

使用mclapply在R中并行处理：函数不起作用

1 个答案: