Question

今天早些时候我问过Mathematica中是否有idiomatic way to count elements matching predicate函数，因为我关注的是性能。

我对给定谓词pred的初始方法如下：

PredCount1[lst_, pred_] := Length@Select[lst, pred];

我有一个建议改为使用

PredCount2[lst_, pred_] := Count[lst, x_/;pred@x];

我开始使用不同的lst尺寸和pred函数分析这些函数，并添加了两个定义：

PredCount3[lst_, pred_] := Count[Thread@pred@lst, True];
PredCount4[lst_, pred_] := Total[If[pred@#, 1, 0] & /@ lst];

我的数据样本范围在1到1千万个元素之间，我的测试函数是EvenQ，#<5&和PrimeQ。下图显示了所花费的时间。

EvenQ EvenQ predicate

PredCount2是最慢的，3和4是公爵。

比较谓词：＃＆lt; 5＆amp;

我选择了这个功能，因为它接近我在实际问题中所需要的功能。不要担心这是一个愚蠢的测试函数，它实际上证明了第四个函数有一些优点，我实际上最终在我的解决方案中使用它。

Less than five predicate

与EvenQ相同，但3明显慢于4。

PrimeQ PrimeQ predicate

这很奇怪。一切都被翻转了。我并不怀疑缓存是这里的罪魁祸首，因为最差值是针对最后计算的函数。

那么，计算列表中与给定谓词函数匹配的元素数量的正确（最快）方法是什么？

Answer 1

您正在查看自动编译的结果。

首先，对于诸如Listable和EvenQ之类的PrimeQ功能，不需要使用Thread：

EvenQ[{1, 2, 3}]

{False, True, False}

这也解释了为什么PredCount3在这些功能上表现良好。（它们在内部针对列表进行线程优化。）

现在让我们看看时间。

dat = RandomInteger[1*^6, 1*^6];

test = # < 5 &;

First@Timing[#[dat, test]] & /@ {PredCount1, PredCount2, PredCount3, PredCount4}

{0.343, 0.437, 0.25, 0.047}

如果我们更改系统选项以阻止Map内的自动编译并再次运行测试：

SetSystemOptions["CompileOptions" -> {"MapCompileLength" -> Infinity}]

First@Timing[#[dat, test]] & /@ {PredCount1, PredCount2, PredCount3, PredCount4}

{0.343, 0.452, 0.234, 0.765}

你可以清楚地看到，没有编译PredCount4要慢得多。简而言之，如果你的测试函数可以由 Mathematica 编译，这是一个不错的选择。

Here are some other examples of fast counting using numeric functions.

Answer 2

列表中整数的性质会对可实现的时间产生重大影响。如果整数范围受到约束，Tally的使用可以提高性能。

(* Count items in the list matching predicate, pred *)

PredCountID[lst_, pred_] := 
Select[Tally@lst, pred@First@# &]\[Transpose] // Last // Total

(* Define the values over which to check timings  *)
ranges = {100, 1000, 10000, 100000, 1000000};
sizes = {100, 1000, 10000, 100000, 1000000, 10000000,100000000};

对于PrimeQ，此功能提供以下时间：

Mathematica graphics

即使在10 ^ 8大小的列表中，如果它们来自{0，...，100000}的整数集合且低于分辨率，则可以在不到十分之一秒的时间内计算Primes。 Timing如果它们在1到100之间的小范围内。

因为谓词只需要应用于Tally值的集合，所以这种方法对确切的谓词函数相对不敏感。

计算元素匹配谓词的最快方法

2 个答案: