我的数据框如下所示:
df
Out[327]:
date store property_name property_value
0 2013-06-20 1 price 101
1 2013-06-20 2 price 201
2 2013-06-21 1 price 301
3 2013-06-21 2 price 401
4 2013-06-20 1 quantity 1000
5 2013-06-20 2 quantity 2000
6 2013-06-21 1 quantity 3000
7 2013-06-21 2 quantity 4000
我想为每个商店计算每个日期的收入,然后将其添加到数据框的底部。例如,对于2014-06-20,对于商店#2:收入= 201 * 2000 = 402000。
以下是我的代码,但我知道它对于更大的数据帧效率不高:
import pandas as pd
dates = df['date'].unique()
stores = df['store'].unique()
df_len = len(df)
for date in dates:
for store in stores:
mask_price = (df['date']==date) & (df['store']==store) & (df['property_name']=='price')
mask_quantity = (df['date']==date) & (df['store']==store) & (df['property_name']=='quantity')
price = df.loc[mask_price,'property_value'].iloc[0]
quantity = df.loc[mask_quantity,'property_value'].iloc[0]
df.loc[df_len,'date'] = date
df.loc[df_len,'store'] = store
df.loc[df_len,'property_name'] = 'revenue'
df.loc[df_len,'property_value'] = price*quantity
df_len=df_len+1
感谢您提前寻求帮助:)
答案 0 :(得分:1)
这是一种方式。
price = df[df['property_name'] == 'price'].set_index(['date', 'store'])['property_value']
quantity = df[df['property_name'] == 'quantity'].set_index(['date', 'store'])['property_value']
rev = (price * quantity).reset_index().assign(property_name='revenue')
df = pd.concat([df, rev], ignore_index=True)
<强>解释强>
price
和quantity
索引来获取date
和store
个数据帧。rev
* price
计算quantity
;添加property_name
列。axis=0
(索引)。<强>结果强>
date property_name property_value store
0 2013-06-20 price 101 1
1 2013-06-20 price 201 2
2 2013-06-21 price 301 1
3 2013-06-21 price 401 2
4 2013-06-20 quantity 1000 1
5 2013-06-20 quantity 2000 2
6 2013-06-21 quantity 3000 1
7 2013-06-21 quantity 4000 2
8 2013-06-20 revenue 101000 1
9 2013-06-20 revenue 402000 2
10 2013-06-21 revenue 903000 1
11 2013-06-21 revenue 1604000 2
答案 1 :(得分:0)
另一种方法:
prices = df[df['property_name'] == 'price']
quantities = df[df['property_name'] == 'quantity']
res = prices.merge(quantities,on=['date','store'],how='left')
res['property_value'] = res['property_value_x']*res['property_value_y']
res['property_name'] = 'revenue'
res = res[['date','store','property_name','property_value']]
res = prices.append([quantities,res])
与第一个答案相同的逻辑:
希望有所帮助。