适合任意高斯函数,python中的大量内存消耗

时间:2013-04-19 05:45:06

标签: python numpy scipy

我正在尝试(在python中)将一系列任意数量的高斯函数(通过仍在改进的简单算法确定)拟合到数据集。对于我当前的样本数据集,我有174个高斯函数。我有一个适合的程序,但它基本上是复杂的猜测和检查,并消耗所有可用的4GB内存。

有没有办法用scipy或numpy中的东西来实现这个目标?

以下是我正在尝试使用的内容,其中wavelength []是x坐标列表,fluxc []是y坐标列表:

#Pick a gaussian
for repeat in range(0,2):
    for f in range(0,len(centroid)):
        #Iterate over every other gaussian
        for i in range(0,len(centroid)):
            if i!= f:
                #For every wavelength,
                for w in wavelength:
                    #Append the value of each to an list, called others
                    others.append(height[i]*math.exp(-(w-centroid[i])**2/(2*width[i]**2)))

    #Optimize the centroid of the current gaussian
        prev = centroid[f]
        best = centroid[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(centroid[i]))-3-repeat),int(round(math.log10(centroid[i])))-6-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                #Change the value of the current item
                centroid[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[i]*math.exp(-(wavelength[k]-centroid[f])**2/(2*width[i]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = centroid[f]
        centroid[f] = best

    #Optimize the height of the current gaussian
        prev = height[f]
        best = height[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(height[i]))-repeat),int(round(math.log10(height[i])))-3-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                #Change the value of the current item
                height[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[f]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[i]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = height[f]
        height[f] = best

    #Optimize the width of the current gaussian
        prev = width[f]
        best = width[f]
        #Pick an order of magnitude
        for p in range (int(round(math.log10(width[i]))-repeat),int(round(math.log10(width[i])))-3-repeat,-1):
            #Pick a value of that order of magnitude
            for m in range (-5,9):
                if prev + m * 10**(p) == 0:
                    m+=1
                #Change the value of the current item
                width[f] = prev + m * 10 **(p)
                #Increment over all wavelengths, make a list of the new values
                variancy = 0
                residual = 0
                test = []
                #Increment across every wavelength and evaluate if this change gets R^2 any larger
                for k in range(0,len(wavelength)):
                    test.append(height[i]*math.exp(-(wavelength[k]-centroid[i])**2/(2*width[f]**2)))
                    residual += (test[k]+others[k]-cflux[k])**2
                    variancy += (test[k]+others[k]-avgcflux)**2
                rsquare = 1-(residual/variancy)
                #Check the R^2 value for this new fit
                if rsquare > bestr:
                    bestr = rsquare
                    best = width[f]
        width[f] = best
        count += 1
        #print '{} of {} peaks optimized, iteration {} of {}'.format(f+1,len(centroid),repeat+1,2)
        complete = round(100*(count/(float(len(centroid))*2)),2)
        print '{}% completed'.format(complete)
    print 'New R^2 = {}'.format(bestr)

1 个答案:

答案 0 :(得分:2)

是的,使用scipy可能会更好(更容易)。但首先,将代码重构为较小的函数;它只是让你更容易阅读和理解正在发生的事情。

至于内存消耗:你可能在某个地方过度扩展列表(others是候选者:我从未看到它被清除(或初始化!),而它被四重循环填充) 。那个,或者你的数据就那么大(在这种情况下你真的应该使用numpy数组,只是为了加快速度)。我不知道,因为你引入了各种变量而没有对大小有所了解(wavelengths有多大?others有多大?数据初始化的内容和位置数组?)

另外,拟合174高斯只是有点疯狂;或者研究另一种确定你想要从数据中获取的东西的方法,或者分解。从wavelengths变量开始,您似乎正在尝试在高分辨率光谱中拟合线条;也许隔离大部分线并分别拟合这些孤立的组更好。如果它们都重叠,我怀疑任何正常的拟合技术都会对你有所帮助。

最后,也许像pandas这样的包可以提供帮助(例如,computation子包。)

也许是最后一次,因为我看到很多可以在代码中得到改进的东西。在某些时候codereview也可能有用。虽然现在我猜你的内存使用量是问题最多的部分。