如何在Python 3中计算移动平均线?

时间:2013-02-14 21:05:19

标签: python python-3.x

假设我有一个清单:

y = ['1', '2', '3', '4','5','6','7','8','9','10']

我想创建一个计算移动n天平均值的函数。 因此,如果n为5,我希望我的代码计算前1-5,添加它并找到平均值,这将是3.0,然后继续到2-6,计算平均值,这将是4.0,然后3-7,4-8,5-9,6-10。

我不想计算前n-1天,因此从第n天开始,它将计算前几天。

def moving_average(x:'list of prices', n):
    for num in range(len(x)+1):
        print(x[num-n:num])

这似乎打印出我想要的东西:

[]
[]
[]
[]
[]

['1', '2', '3', '4', '5']

['2', '3', '4', '5', '6']

['3', '4', '5', '6', '7']

['4', '5', '6', '7', '8']

['5', '6', '7', '8', '9']

['6', '7', '8', '9', '10']

但是,我不知道如何计算这些列表中的数字。有什么想法吗?

5 个答案:

答案 0 :(得分:21)

旧版本的Python文档中有一个很棒的滑动窗口生成器itertools examples

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

使用你的移动平均线是微不足道的:

from __future__ import division  # For Python 2

def moving_averages(values, size):
    for selection in window(values, size):
        yield sum(selection) / size

针对您的输入运行此命令(将字符串映射到整数)会给出:

>>> y= ['1', '2', '3', '4','5','6','7','8','9','10']
>>> for avg in moving_averages(map(int, y), 5):
...     print(avg)
... 
3.0
4.0
5.0
6.0
7.0
8.0

要为“不完整”设置返回None第一次n - 1次迭代,只需展开moving_averages函数:

def moving_averages(values, size):
    for _ in range(size - 1):
        yield None
    for selection in window(values, size):
        yield sum(selection) / size

答案 1 :(得分:6)

虽然我喜欢Martijn's answer,就像乔治一样,我想知道通过使用运行求和而不是一次又一次地应用sum()这是不是更快相同的数字。

此外,在加速阶段将None值设为默认值的想法很有意思。实际上,可能存在许多可以设想移动平均线的不同场景。让我们将平均值的计算分为三个阶段:

  1. Ramp Up:开始迭代,其中当前迭代计数<窗口大小
  2. 稳定进度:我们有足够的窗口大小可用于计算正常average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
  3. 的元素数量
  4. Ramp Down:在输入数据的末尾,我们可以返回另一个window_size - 1"平均值"号。
  5. 这是一个接受

    的功能
    • 任意迭代(生成器很好)作为数据的输入
    • 任意窗口大小> = 1
    • 用于在Ramp Up / Down
    • 阶段打开/关闭值生成的参数
    • 这些阶段的回调函数用于控制值的生成方式。这可用于不断提供默认值(例如None)或提供​​部分平均值

    以下是代码:

    from collections import deque 
    
    def moving_averages(data, size, rampUp=True, rampDown=True):
        """Slide a window of <size> elements over <data> to calc an average
    
        First and last <size-1> iterations when window is not yet completely
        filled with data, or the window empties due to exhausted <data>, the
        average is computed with just the available data (but still divided
        by <size>).
        Set rampUp/rampDown to False in order to not provide any values during
        those start and end <size-1> iterations.
        Set rampUp/rampDown to functions to provide arbitrary partial average
        numbers during those phases. The callback will get the currently
        available input data in a deque. Do not modify that data.
        """
        d = deque()
        running_sum = 0.0
    
        data = iter(data)
        # rampUp
        for count in range(1, size):
            try:
                val = next(data)
            except StopIteration:
                break
            running_sum += val
            d.append(val)
            #print("up: running sum:" + str(running_sum) + "  count: " + str(count) + "  deque: " + str(d))
            if rampUp:
                if callable(rampUp):
                    yield rampUp(d)
                else:
                    yield running_sum / size
    
        # steady
        exhausted_early = True
        for val in data:
            exhausted_early = False
            running_sum += val
            #print("st: running sum:" + str(running_sum) + "  deque: " + str(d))
            yield running_sum / size
            d.append(val)
            running_sum -= d.popleft()
    
        # rampDown
        if rampDown:
            if exhausted_early:
                running_sum -= d.popleft()
            for (count) in range(min(len(d), size-1), 0, -1):
                #print("dn: running sum:" + str(running_sum) + "  deque: " + str(d))
                if callable(rampDown):
                    yield rampDown(d)
                else:
                    yield running_sum / size
                running_sum -= d.popleft()
    

    它似乎比Martijn的版本快一点 - 但它更优雅。这是测试代码:

    print("")
    print("Timeit")
    print("-" * 80)
    
    from itertools import islice
    def window(seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    
    # Martijn's version:
    def moving_averages_SO(values, size):
        for selection in window(values, size):
            yield sum(selection) / size
    
    
    import timeit
    problems = [int(i) for i in (10, 100, 1000, 10000, 1e5, 1e6, 1e7)]
    for problem_size in problems:
        print("{:12s}".format(str(problem_size)), end="")
    
        so = timeit.repeat("list(moving_averages_SO(range("+str(problem_size)+"), 5))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages_SO")
        print("{:12.3f} ".format(min(so)), end="")
    
        my = timeit.repeat("list(moving_averages(range("+str(problem_size)+"), 5, False, False))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages")
        print("{:12.3f} ".format(min(my)), end="")
    
        print("")
    

    输出:

    Timeit
    --------------------------------------------------------------------------------
    10                 7.242        7.656 
    100                5.816        5.500 
    1000               5.787        5.244 
    10000              5.782        5.180 
    100000             5.746        5.137 
    1000000            5.745        5.198 
    10000000           5.764        5.186 
    

    现在可以通过此函数调用解决原始问题:

    print(list(moving_averages(range(1,11), 5,
                               rampUp=lambda _: None,
                               rampDown=False)))
    

    输出:

    [None, None, None, None, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
    

答案 2 :(得分:1)

使用summap功能。

print(sum(map(int, x[num-n:num])))

Python 3中的map函数基本上是 lazy 版本:

[int(i) for i in x[num-n:num]]

我确信你可以猜出sum函数的作用。

答案 3 :(得分:1)

避免重新计算中间数的方法..

list=range(0,12)
def runs(v):
 global runningsum
 runningsum+=v
 return(runningsum)
runningsum=0
runsumlist=[ runs(v) for v in list ]
result = [ (runsumlist[k] - runsumlist[k-5])/5 for k in range(0,len(list)+1)]

打印结果

[2,3,4,5,6,7,8,9]

make that runs(int(v)).. then .. repr(runsumlist [k] - runsumlist [k-5])/ 5) 如果你蚂蚁携带数字字符串..


没有全局的Alt:

list = [float[x] for x in range(0,12)]
nave = 5
movingave = sum(list[:nave]/nave)
for i in range(len(list)-nave):movingave.append(movingave[-1]+(list[i+nave]-list[i])/nave)
print movingave 

即使您输入的值是整数

,也一定要做浮动数学运算
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9,0]

答案 4 :(得分:0)

另一个解决方案是itertools食谱pairwise()。您可以将其扩展为nwise(),它为您提供了滑动窗口(如果iterable是生成器,则可以工作):

def nwise(iterable, n):
    ts = it.tee(iterable, n)
    for c, t in enumerate(ts):
        next(it.islice(t, c, c), None)
    return zip(*ts)

def moving_averages_nw(iterable, n):
    yield from (sum(x)/n for x in nwise(iterable, n))

>>> list(moving_averages_nw(range(1, 11), 5))
[3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

虽然短iterable的设置成本相对较高,但此成本会降低影响,但数据集的时间越长。这使用sum(),但代码相当优雅:

Timeit              MP           cfi         *****
--------------------------------------------------------------------------------
10                 4.658        4.959        7.351 
100                5.144        4.070        4.234 
1000               5.312        4.020        3.977 
10000              5.317        4.031        3.966 
100000             5.508        4.115        4.087 
1000000            5.526        4.263        4.202 
10000000           5.632        4.326        4.242