加速Pandas中的多循环数据计算

时间:2016-05-31 06:45:32

标签: python loops pandas dataframe

这是我的问题。以下面的数据框为例:

enter image description here

  • 数据框df有8列,每列都有有限的值。
  • 我要做的事:
    • 一个。通过
    • 循环遍历数据框
    • 湾在每一行中,列 B1 B2 B3 B4 B5 B6 将更改为 B * x A

这样的代码:

 for i in range(0,len(df),1):
     col_B = ["B1","B2","B3","B4","B5","B6",]
     for j in range(len(col_B)):
         df.[col_B[j]].iloc[i] = df.[col_B[j]].iloc[i]*df.A.iloc[i]  

在我的包含224行和9列的实际数据中,遍历所有这些单元格需要花费 0:01:03

如何提高熊猫的循环速度?

任何建议都会受到赞赏。

1 个答案:

答案 0 :(得分:2)

您可以先filter DataFrame,然后按mul多次:

print(df.filter(like='B').mul(df.A, axis=0))

样品:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,2,3],
                   'B1':[4,5,6],
                   'B2':[7,8,9],
                   'B3':[1,3,5],
                   'B4':[5,3,6],
                   'B5':[7,4,3],
                   'B6':[1,3,7]})

print (df)
   A  B1  B2  B3  B4  B5  B6
0  1   4   7   1   5   7   1
1  2   5   8   3   3   4   3
2  3   6   9   5   6   3   7

print(df.filter(like='B').mul(df.A, axis=0))
   B1  B2  B3  B4  B5  B6
0   4   7   1   5   7   1
1  10  16   6   6   8   6
2  18  27  15  18   9  21

如果需要列A使用concat

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
   A  B1  B2  B3  B4  B5  B6
0  1   4   7   1   5   7   1
1  2  10  16   6   6   8   6
2  3  18  27  15  18   9  21

<强>计时

len(df)=3

In [416]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
1000 loops, best of 3: 1.01 ms per loop

In [417]: %timeit loop(df)
100 loops, best of 3: 3.28 ms per loop

len(df)=30k

In [420]: %timeit (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))
The slowest run took 4.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3 ms per loop

In [421]: %timeit loop(df)
1 loop, best of 3: 35.6 s per loop

时间安排的代码

import pandas as pd

df = pd.DataFrame({'A':[1,2,3],
                   'B1':[4,5,6],
                   'B2':[7,8,9],
                   'B3':[1,3,5],
                   'B4':[5,3,6],
                   'B5':[7,4,3],
                   'B6':[1,3,7]})

print (df)

df = pd.concat([df]*10000).reset_index(drop=True)

print (pd.concat([df.A, df.filter(like='B').mul(df.A, axis=0)], axis=1))

def loop(df):
    for i in range(0,len(df),1):
         col_B = ["B1","B2","B3","B4","B5","B6",]
         for j in range(len(col_B)):
             df[col_B[j]].iloc[i] = df[col_B[j]].iloc[i]*df.A.iloc[i]  
    return df

print (loop(df))