数据帧与不同网络长度的乘法

时间:2016-04-22 05:56:03

标签: python numpy pandas dataframe arithmetic-expressions

我有两个数据帧:两个都有5列,但第一个有100行,第二个只有一行。我应该将第一个数据帧的每一行乘以第二个数据的这一行,然后汇总每一行中列的值,并将该值汇总到第6个新列'乘法和的总和中。#34;。我已经看过" np.dot"操作,但我不确定我是否可以将它应用于数据帧。我也在寻找pythonic / pandas操作或方法,如果它可以从头开始替换一点点重的numpy代码?提前感谢您的建议。

2 个答案:

答案 0 :(得分:1)

我认为您可以将DataFrames转换为numpy arrays values,多个转发sum

import pandas as pd
import numpy as np

np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
   A  B  C  D  E
0  5  8  9  5  0

np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1
3  6  7  7  8  1
4  5  9  8  9  4
5  3  0  3  5  0
6  2  3  8  1  3
7  3  3  7  0  1
8  9  9  0  4  7
9  3  2  7  2  0
print df2.values * df1.values
[[25  0 27 15  0]
 [45 24 45 10  0]
 [35 48 72 40  0]
 [30 56 63 40  0]
 [25 72 72 45  0]
 [15  0 27 25  0]
 [10 24 72  5  0]
 [15 24 63  0  0]
 [45 72  0 20  0]
 [15 16 63 10  0]]

df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
    0   1   2   3  4  sum
0  25   0  27  15  0   67
1  45  24  45  10  0  124
2  35  48  72  40  0  195
3  30  56  63  40  0  189
4  25  72  72  45  0  214
5  15   0  27  25  0   67
6  10  24  72   5  0  111
7  15  24  63   0  0  102
8  45  72   0  20  0  137
9  15  16  63  10  0  104

<强>时序

In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 287 µs per loop

In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 98 µs per loop

答案 1 :(得分:0)

你可能正在寻找这样的东西:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4], 
                     'B' : [-1.,-2.5, -3.9]})

df1['sum of multipliations']=df1.sum(axis = 1)


df2 = pd.DataFrame({ 'A' : [2.], 
                     'B' : [3.], 
                     'sum of multipliations' : [1.]})

print df1
print df2

row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5
相关问题