为什么random.choices比NumPy的随机选择要快?

时间:2020-03-11 05:43:54

标签: numpy random

我试图以Python中最有效的方式进行随机采样,但是我感到困惑,因为使用numpy的random.choices()的速度比使用random.choices()的速度慢

import numpy as np
import random

np.random.seed(12345)

# use gamma distribution
shape, scale = 2.0, 2.0 
s = np.random.gamma(shape, scale, 1000000)
meansample = []

samplesize = 500

%timeit meansample = [ np.mean( np.random.choice( s, samplesize, replace=False)) for _ in range(500)]
23.3 s ± 229 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit meansample = [np.mean(random.choices(s, k=samplesize)) for x in range(0,500)]
152 ms ± 324 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

23秒vs 152毫秒是很多时间

我做错了什么?

1 个答案:

答案 0 :(得分:2)

这里有两个问题。首先,对于纯Python random库,您可能打算使用sample而不是choices进行采样而不进行替换。这在一定程度上改变了基准。其次,np.random.choice具有更好的采样替代性能,无需替换。这是与随机生成器API有关的已知issue。您可以使用np.random.Generator获得更好的性能。我的时间安排:

%timeit meansample = [ np.mean( np.random.choice( s, samplesize, replace=False)) for _ in range(500)]
# 1 loop, best of 3: 12.4 s per loop

%timeit meansample = [np.mean(random.choices(s, k=samplesize)) for x in range(0,500)]
# 10 loops, best of 3: 118 ms per loop

sl = s.tolist()
%timeit meansample = [np.mean(random.sample(sl, k=samplesize)) for x in range(0,500)]
# 1 loop, best of 3: 219 ms per loop

g = np.random.Generator(np.random.PCG64())
%timeit meansample = [ np.mean( g.choice( s, samplesize, replace=False)) for _ in range(500)]
# 10 loops, best of 3: 25 ms per loop

因此,random.sample在不进行替换的情况下胜过np.random.choice,但比np.random.Generator.choice慢。

相关问题