Question

我有能量幂律分布，我想根据分布选择n个随机能量。我尝试使用随机数手动执行此操作，但这对我想做的事情来说效率太低。我想知道numpy（或其他）中的方法是否像numpy.random.normal一样工作，除了使用正态分布，可以指定分布。所以在我看来，一个例子可能看起来像（类似于numpy.random.normal）：

import numpy as np

# Energies from within which I want values drawn
eMin = 50.
eMax = 2500.

# Amount of energies to be drawn
n = 10000

photons = []

for i in range(n):

    # Method that I just made up which would work like random.normal,
    # i.e. return an energy on the distribution based on its probability,
    # but take a distribution other than a normal distribution
    photons.append(np.random.distro(eMin, eMax, lambda e: e**(-1.)))

print(photons)

打印photons应该给我一个长度为10000的列表，其中包含此分布中的能量。如果我要进行直方图分析，那么在较低的能量下它会有更大的bin值。

我不确定这种方法是否存在但似乎应该存在。我希望很清楚我想做什么。

编辑：

我看过numpy.random.power，但我的指数是-1，所以我认为这不会奏效。

Answer 1

从任意PDF中抽样实际上很难。 large and dense books只是关于如何从标准的分布族中高效准确地进行采样。

对于您给出的示例，看起来您可能会使用自定义反转方法。

Answer 2

如果要从任意分布中进行采样，则需要使用累积密度函数的倒数（而不是pdf）。

然后从范围[0,1]中均匀地对概率进行采样，并将其提供给cdf的反转以获得相应的值。

通常无法从分析中获得pdf中的cdf。但是，如果你很乐意接近分布，你可以通过在其域上以规则的间隔计算f（x），然后在这个向量上做一个cumsum来获得cdf的近似值，并从此近似得到逆。

粗略代码段：

import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate

def f(x):
   """
   substitute this function with your arbitrary distribution
   must be positive over domain
   """
   return 1/float(x)


#you should vary inputVals to cover the domain of f (for better accurracy you can
#be clever about spacing of values as well). Here i space them logarithmically
#up to 1 then at regular intervals but you could definitely do better
inputVals = np.hstack([1.**np.arange(-1000000,0,100),range(1,10000)])

#everything else should just work
funcVals = np.array([f(x) for x in inputVals])
cdf = np.zeros(len(funcVals))
diff = np.diff(funcVals)
for i in xrange(1,len(funcVals)):
   cdf[i] = cdf[i-1]+funcVals[i-1]*diff[i-1]
cdf /= cdf[-1]

#you could also improve the approximation by choosing appropriate interpolator
inverseCdf = scipy.interpolate.interp1d(cdf,inputVals)

#grab 10k samples from distribution
samples = [inverseCdf(x) for x in np.random.uniform(0,1,size = 100000)]

plt.hist(samples,bins=500)
plt.show()

Answer 3

为什么不使用eval并将分发放在字符串中？

>>> cmd = "numpy.random.normal(500)"
>>> eval(cmd)

您可以根据需要设置字符串来操作字符串。

numpy.random.normal不同的分布：从分布中选择值

3 个答案: