有条件地为其他数据框元素创建数据框

时间:2020-01-02 17:21:58

标签: python pandas

2020年快乐!我想创建一个基于其他两个的数据框。我有以下两个数据框:

df1 = pd.DataFrame({'date':['03.05.1982','04.05.1982','05.05.1982','06.05.1982','07.05.1982','10.05.1982','11.05.1982'],'A': [63.63,64.08,64.19,65.11,65.36,65.25,65.36], 'B': [63.83, 64.10, 64.19, 65.08, 65.33, 65.28, 65.36], 'C':[63.99, 64.22, 64.30, 65.16, 65.41, 65.36, 65.44]})

df2 = pd.DataFrame({'Name':['A','B','C'],'Notice': ['05.05.1982','07.05.1982','12.05.1982']})

这个想法是创建df3,以便此数据帧在到达A的通知日期(在df2中找到)之前取A的值,然后df3切换到B的值,直到B的通知日期为止,依此类推。当我们在通知日期时,它应该取当前列与下一个列之间的平均值。

在上面的示例中,df3应该如下所示(用公式说明):

df3 = pd.DataFrame({'date':['03.05.1982','04.05.1982','05.05.1982','06.05.1982','07.05.1982','10.05.1982','11.05.1982'], 'Result':[63.63,64.08,(64.19+64.19)/2,65.08,(65.33+65.41)/2,65.36,65.44]})

我的想法是首先创建一个与df1尺寸相同的临时数据框,并在索引日期在通知前之前用1填充,在索引后用0填充。对窗口1进行滚动平均将为每一列提供一系列1,直到我达到0.5(表示一个切换)。 不确定是否有更好的方法来获取df3?

我尝试了以下操作:

def fill_rule(df_p,df_t):
     return np.where(df_p.index > df_t[df_t.Name==df_p.name]['Notice'][0], 0, 1)

df1['date'] = pd.to_datetime(df1['date'])
df2['notice'] = pd.to_datetime(df2['notice'])
df1.set_index("date", inplace = True)

temp = df1.apply(lambda x: fill_rule(x, df2), axis = 0)

我收到以下错误:KeyError: (0, 'occurred at index B')

2 个答案:

答案 0 :(得分:1)

您可以使用between方法在两个数据框中选择特定的日期范围,然后使用iloc替换特定的值

#Initializing the output
df3 = df1.copy()
df3.drop(['B','C'], axis = 1, inplace = True)
df3.columns = ['date','Result']
df3['Result'] = 0.0
df3['count'] = 0


#Modifying df2 to add a dummy sample at the beginning
temp = df2.copy()
temp = temp.iloc[0]
temp = pd.DataFrame(temp).T
temp.Name ='Z'
temp.Notice = pd.to_datetime("05-05-1980")
df2 = pd.concat([temp,df2])


for i in range(len(df2)-1):
    startDate = df2.iloc[i]['Notice']
    endDate = df2.iloc[i+1]['Notice']

    name = df2.iloc[i+1]['Name']


    indices = [df1.date.between(startDate, endDate, inclusive=True)][0]


    df3.loc[indices,'Result'] += df1[indices][name]
    df3.loc[indices,'count'] += 1


df3.Result = df3.apply(lambda x : x.Result/x['count'], axis = 1)

答案 1 :(得分:1)

df1['t'] = df1['date'].map(df2.set_index(["Notice"])['Name'])
df1['t'] =df1['t'].fillna(method='bfill').fillna("C")

df3 = pd.DataFrame()
df3['Result'] = df1.apply(lambda row: row[row['t']],axis =1)
df3['date'] = df1['date']