Question

我有两个像这样的datframes

df1

posting_period      name        sales       profit
    1               client1     50.00       10.00
    1               client2     100.00      20.00
    2               client1     150.00      30.00

df2 (this df does not have the 'profit' column as in df1) 

posting_period      name        sales       
    1               client1     10.00       
    2               client1     20.00

我要更新client1在df1中的销售额，其中posting_periods相匹配的{1}中client1的销售额和df1中client1的销售额之和。换句话说

df2

我正在使用的实际数据帧要大得多，但是这些示例捕获了我要完成的工作。我想出了一种非常有效的方法，不仅没有用，而且不是很pythonic。另一个挑战是desired result posting_period name sales profit 1 client1 60.00 10.00 1 client2 100.00 20.00 2 client1 170.00 30.00中没有df1中的附加列。我希望有人可以提出替代方案。谢谢！

Answer 1

首先从df2将索引列映射到sales创建一系列：

idx_cols = ['posting_period', 'name']
s = df2.set_index(idx_cols)['sales']

然后使用以下系列更新df1['sales']：

df1['sales'] += pd.Series(df1.set_index(idx_cols).index.map(s.get)).fillna(0)

结果：

print(df1)

   posting_period     name  sales  profit
0               1  client1   60.0    10.0
1               1  client2  100.0    20.0
2               2  client1  170.0    30.0

Answer 2

将merge与左连接一起用于对齐的Series和最后一个add：

s = df1.merge(df2, on=['posting_period','name'], how='left')['sales_y']

df1['sales'] = df1['sales'].add(s, fill_value=0)
print (df1)
   posting_period     name  sales  profit
0               1  client1   60.0    10.0
1               1  client2  100.0    20.0
2               2  client1  170.0    30.0

Answer 3

您可以将pd.concat与sum一起使用

pd.concat([df1.set_index(['posting_period', 'name']),df2.set_index(['posting_period', 'name'])],1).sum(level=0,axis=1).reset_index()
Out[728]: 
   posting_period     name  sales  profit
0               1  client1   60.0    10.0
1               1  client2  100.0    20.0
2               2  client1  170.0    30.0

更新值等于相同df和另一个df之和的熊猫数据框

3 个答案: