如何合并两个熊猫DataFame,以便包含每个日期并将空值设置为零?我对pd.merge_asof进行了很好的浏览,但没有看到适合我的用例的示例,因此无法正常工作。
# imports
import pandas as pd
import numpy as np
# shared data
column_names = ['date', 'gross_profit', 'costs', 'factory_id']
# df1 construction
range_1 = pd.date_range('2019-01-01', periods=3, freq='2D')
gross_profit_1 = [100, 200, 300]
costs_1 = [-20, -30, -40]
factory_id_1 = ['A', 'A', 'A']
values_1 = np.array([range_1, gross_profit_1, costs_1, factory_id_1]).T
df1 = pd.DataFrame(values_1, index=range_1, columns=column_names)
# df2 construction
range_2 = pd.date_range('2019-01-02', periods=3, freq='2D')
gross_profit_2 = [400, -300, 900]
costs_1 = [-90, -80, -70]
factory_id_2 = ['B', 'B', 'B']
values_2 = np.array([range_2, gross_profit_2, costs_2, factory_id_2]).T
df2 = pd.DataFrame(values_2, index=range_2, columns=column_names)
>>> print(df1)
date gross_profit costs factory_id
2019-01-01 2019-01-01 00:00:00 100 -20 A
2019-01-03 2019-01-03 00:00:00 200 -30 A
2019-01-05 2019-01-05 00:00:00 300 -40 A
>>> print(df2)
date gross_profit costs factory_id
2019-01-02 2019-01-02 00:00:00 400 -90 B
2019-01-04 2019-01-04 00:00:00 -300 -80 B
2019-01-06 2019-01-06 00:00:00 900 -70 B
merged_df
:>>> print(merged_df)
date gross_profit_A gross_profit_B
2019-01-01 2019-01-01 00:00:00 100 0
2019-01-02 2019-01-02 00:00:00 0 400
2019-01-03 2019-01-03 00:00:00 200 0
2019-01-04 2019-01-04 00:00:00 0 -300
2019-01-05 2019-01-05 00:00:00 300 0
2019-01-06 2019-01-06 00:00:00 0 900
total_gross_profit = merged_df.gross_profit_A + merged_df.gross_profit_B
cumulative_gross_profit = np.cumsum(total_gross_profit)
>>> print(cumulative_gross_profit)
2019-01-01 100
2019-01-02 500
2019-01-03 700
2019-01-04 400
2019-01-05 700
2019-01-06 1500
Freq: 1D, Name: cumulative_gross_profit, dtype: object
我在每个DataFrame中都包含了costs
,因为我想表明自己最终想要对多个列进行此操作。
答案 0 :(得分:2)
这是concat
pd.concat([df1[['gross_profit']].\
add_suffix(df1.factory_id[0]),
df2[['gross_profit']].\
add_suffix(df2.factory_id[0])],axis=0,sort=True).\
sort_index().\
fillna(0)
Out[163]:
gross_profitA gross_profitB
2019-01-01 100 0
2019-01-02 0 400
2019-01-03 200 0
2019-01-04 0 -300
2019-01-05 300 0
2019-01-06 0 900