Question

我最近在R的Windows机器上开发了一个模糊字符串匹配例程。我对速度非常满意。现在我尝试在虚拟redhat服务器上运行相同的程序，它的速度要慢得多，即大约相当于。 100。 Windows机器上整个过程需要1个小时（6个内核，Intel，3.4Ghz）

我基本上做的是：

location <- (if (RB$ORT[x] == "n/a"){rep(NA, length(TAC$ORT))} else {stringdist(RB$ORT[x], TAC$ORT, useBytes = TRUE)})

在redhat机器上（14核，AMD，2.6 GHz）我运行R并启用了openblas。 r-package stringdist位于版本0.9.4.1中的两台机器上上面的命令运行了几百万次。奇怪的是它甚至似乎放慢了速度。启动过程时，我的日志告诉我：

get location right: 0.04 secs
engine used: tclget location right: 0.05 secs
engine used: tclget location right: 0.02 secs
engine used: tclget location right: 0.01 secs
engine used: tclget location right: 0.02 secs
engine used: tclget location right: 0.03 secs
engine used: tclget location right: 0.02 secs

几个小时后它告诉我：

get location right: 0.27 secs
get location right: 0.27 secs
get location right: 0.26 secs
engine used: tclget location right: 0.14 secs
get location right: 0.27 secs
engine used: tclget location right: 0.26 secs
engine used: tclget location right: 0.23 secs
engine used: tclget location right: 0.14 secs
get location right: 0.28 secs
get location right: 0.29 secs

在Windows机器上，这看起来像这样（6个进程正在写入日志）：

get location right: 0 secs
get location right: 0 secs
engine used: tclget location right: 0 secs
get location right: 0 secs
engine used: tclget location right: 0 secs
engine used: tclengine used: tclget location right: 0 secs
get location right: 0 secs
get location right: 0 secs

在Windows机器上，我们不使用RevolutionR（或其R-open-MS变体）。不知道它是否使用mkl，但实际上在使用R中的字符类时无关紧要。某种编码问题可能是原因吗？使用Rprof进行性能分析时，在Windows和Linux上不会报告相同的绝对时间。关于相对时间，只有enc2utf8似乎在linux上更加突出。

还有其他想法吗？ thnx martin

Windows与Linux上的stringdist性能（红帽）

0 个答案: