python - 适合任意高斯函数，python中的大量内存消耗

我正在尝试（在python中）将一系列任意数量的高斯函数（通过仍在改进的简单算法确定）拟合到数据集。对于我当前的样本数据集，我有174个高斯函数。我有一个适合的程序，但它基本上是复杂的猜测和检查，并消耗所有可用的4GB内存。

有没有办法用scipy或numpy中的东西来实现这个目标？

以下是我正在尝试使用的内容，其中wavelength []是x坐标列表，fluxc []是y坐标列表：

#Pick a gaussian
for repeat in range(0,2):
    for f in range(0,len(centroid)):
        #Iterate over every other gaussian
        for i in range(0,len(centroid)):
            if i!= f:
                #For every wavelength,
                for w in wavelength:
                    #Append the value of each to an list, called others
                    others.append(height[i]*math.exp(-(w-centroid[i])**2/(2*width[i]**2)))

    #Optimize the centroid of the current gaussian
        prev = centroid[f]
        best = centroid[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(centroid[i]))-3-repeat),int(round(math.log10(centroid[i])))-6-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                #Change the value of the current item
                centroid[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[i]*math.exp(-(wavelength[k]-centroid[f])**2/(2*width[i]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = centroid[f]
        centroid[f] = best

    #Optimize the height of the current gaussian
        prev = height[f]
        best = height[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(height[i]))-repeat),int(round(math.log10(height[i])))-3-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                #Change the value of the current item
                height[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[f]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[i]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = height[f]
        height[f] = best

    #Optimize the width of the current gaussian
        prev = width[f]
        best = width[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(width[i]))-repeat),int(round(math.log10(width[i])))-3-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                if prev + m * 10**(p) == 0:
                    m+=1
                #Change the value of the current item
                width[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[i]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[f]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = width[f]
        width[f] = best
        count += 1
        #print '{} of {} peaks optimized, iteration {} of {}'.format(f+1,len(centroid),repeat+1,2)
        complete = round(100*(count/(float(len(centroid))*2)),2)
        print '{}% completed'.format(complete)
    print 'New R^2 = {}'.format(bestr)

是的，使用scipy可能会更好（更容易）。但首先，将代码重构为较小的函数;它只是让你更容易阅读和理解正在发生的事情。

至于内存消耗：你可能在某个地方过度扩展列表（others是候选者：我从未看到它被清除（或初始化！），而它被四重循环填充）。那个，或者你的数据就那么大（在这种情况下你真的应该使用numpy数组，只是为了加快速度）。我不知道，因为你引入了各种变量而没有对大小有所了解（wavelengths有多大？others有多大？数据初始化的内容和位置数组？）

另外，拟合174高斯只是有点疯狂;或者研究另一种确定你想要从数据中获取的东西的方法，或者分解。从wavelengths变量开始，您似乎正在尝试在高分辨率光谱中拟合线条;也许隔离大部分线并分别拟合这些孤立的组更好。如果它们都重叠，我怀疑任何正常的拟合技术都会对你有所帮助。

最后，也许像pandas这样的包可以提供帮助（例如，computation子包。）

也许是最后一次，因为我看到很多可以在代码中得到改进的东西。在某些时候codereview也可能有用。虽然现在我猜你的内存使用量是问题最多的部分。

适合任意高斯函数，python中的大量内存消耗

1 个答案: