Question

我有输入数据的直方图（黑色），如下图所示：

我尝试拟合Gamma distribution但不是整个数据，而只是拟合直方图的第一条曲线（第一种模式）。上图中的绿色图对应于我使用以下使用Gamma distribution的{{1}}代码在所有样本上拟合python时：

scipy.stats.gamma

如何仅将拟合限制为此数据的有趣子集？

Update1（切片）：

我通过仅保留低于前一个直方图最大值的值来对输入数据进行切片，但结果并不真正令人信服：

这是通过在前面的代码中的img = IO.read(input_file) data = img.flatten() + abs(np.min(img)) + 1 # calculate dB positive image img_db = 10 * np.log10(img) img_db_pos = img_db + abs(np.min(img_db)) data = img_db_pos.flatten() + 1 # data histogram n, bins, patches = plt.hist(data, 1000, normed=True) # slice histogram here # estimation of the parameters of the gamma distribution fit_alpha, fit_loc, fit_beta = gamma.fit(data, floc=0) x = np.linspace(0, 100) y = gamma.pdf(x, fit_alpha, fit_loc, fit_beta) print '(alpha, beta): (%f, %f)' % (fit_alpha, fit_beta) # plot estimated model plt.plot(x, y, linewidth=2, color='g') plt.show()注释下面插入以下代码来实现的：

# slice histogram here

Update2（scipy.optimize.minimize）：

以下代码显示max_data = bins[np.argmax(n)] data = data[data < max_data]如何用于最小化能量函数以查找scipy.optimize.minimize()：

(alpha, beta)

上述算法收敛为import matplotlib.pyplot as plt import numpy as np from geotiff.io import IO from scipy.stats import gamma from scipy.optimize import minimize def truncated_gamma(x, max_data, alpha, beta): gammapdf = gamma.pdf(x, alpha, loc=0, scale=beta) norm = gamma.cdf(max_data, alpha, loc=0, scale=beta) return np.where(x < max_data, gammapdf / norm, 0) # read image img = IO.read(input_file) # calculate dB positive image img_db = 10 * np.log10(img) img_db_pos = img_db + abs(np.min(img_db)) data = img_db_pos.flatten() + 1 # data histogram n, bins = np.histogram(data, 100, normed=True) # using minimize on a slice data below max of histogram max_data = bins[np.argmax(n)] data = data[data < max_data] data = np.random.choice(data, 1000) energy = lambda p: -np.sum(np.log(truncated_gamma(data, max_data, *p))) initial_guess = [np.mean(data), 2.] o = minimize(energy, initial_guess, method='SLSQP') fit_alpha, fit_beta = o.x # plot data histogram and model x = np.linspace(0, 100) y = gamma.pdf(x, fit_alpha, 0, fit_beta) plt.hist(data, 30, normed=True) plt.plot(x, y, linewidth=2, color='g') plt.show()的子集，data中的输出为：

但是从下面的屏幕截图中可以看出，伽玛图不符合直方图：

Answer 1

您可以使用常规优化工具（例如scipy.optimize.minimize）来拟合所需函数的截断版本，从而获得良好的拟合： Truncated fit

首先，修改后的功能：

def truncated_gamma(x, alpha, beta):
    gammapdf = gamma.pdf(x, alpha, loc=0, scale=beta)
    norm = gamma.cdf(max_data, alpha, loc=0, scale=beta)
    return np.where(x<max_data, gammapdf/norm, 0)

这将从x < max_data的gamma分布中选择值，而在其他地方选择零。 np.where部分在这里实际上并不重要，因为无论如何数据都专门位于max_data的左侧。关键是规范化，因为变化alpha和beta会改变原始伽玛中截断点左侧的区域。

其余的只是优化技术。

通常的做法是使用对数，所以我使用了有时被称为“能量”的东西，或者使用了概率密度的倒数的对数。

energy = lambda p: -np.sum(np.log(truncated_gamma(data, *p)))

最小化：

initial_guess = [np.mean(data), 2.]
o = minimize(energy, initial_guess, method='SLSQP')
fit_alpha, fit_beta = o.x

我的输出是(alpha, beta): (11.595208, 824.712481)。与原始版本一样，它是最大似然估计值。

如果您对收敛率不满意，可能需要

从相当大的数据集中选择一个样本： data = np.random.choice(data, 10000)
使用method关键字参数尝试不同的算法。

一些优化例程输出逆hessian的表示，这对于不确定性估计是有用的。对参数执行非负性也可能是一个好主意。

没有截断的对数缩放图显示整个分布：

Answer 2

这是另一种可能的方法，在excel中使用手动创建的数据集，或多或少与给定的图形匹配。

原始数据

<强>概要

将数据导入Pandas数据帧。
屏蔽后的索引最大响应指数。
创建剩余数据的镜像。
附加镜像，同时留下空白缓冲区。
使所需的分布适合修改后的数据。下面我通过矩量法进行正常拟合并调整幅度和宽度。

工作脚本

    # Import data to dataframe.
    df = pd.read_csv('sample.csv', header=0, index_col=0)
    # Mask indices after index at max Y.
    mask = df.index.values <= df.Y.argmax()
    df = df.loc[mask, :]
    scaled_y = 100*df.Y.values

    # Create new df with mirror image of Y appended.
    sep = 6
    app_zeroes = np.append(scaled_y, np.zeros(sep, dtype=np.float))
    mir_y = np.flipud(scaled_y)
    new_y = np.append(app_zeroes, mir_y)

    # Using Scipy-cookbook to fit a normal by method of moments.
    idxs = np.arange(new_y.size)  # idxs=[0, 1, 2,...,len(data)]
    mid_idxs = idxs.mean() # len(data)/2
    # idxs-mid_idxs is [-53.5, -52.5, ..., 52.5, len(data)/2]
    scaling_param = np.sqrt(np.abs(np.sum((idxs-mid_idxs)**2*new_y)/np.sum(new_y)))

    # adjust amplitude
    fmax = new_y.max()*1.2 # adjusted function max to 120% max y.
    # adjust width
    scaling_param = scaling_param*.7 # adjusted by 70%.
    # Fit normal.
    fit = lambda t: fmax*np.exp(-(t-mid_idxs)**2/(2*scaling_param**2))

    # Plot results.
    plt.plot(new_y, '.')
    plt.plot(fit(idxs), '--')
    plt.show()

<强>结果

请参阅scipy-cookbook fitting data页面，详细了解如何使用法定方法。

仅将伽玛分布拟合到样本的子集

Update1（切片）：

Update2（scipy.optimize.minimize）：

2 个答案: