将列添加到groupby数据帧

时间:2017-08-02 02:42:36

标签: python pandas pandas-groupby

如何添加' Sum'通过数据框到熊猫组的列? 我想做一个' Sum'在'看跌'和'看涨'下面的groupby数据帧的内部列。

然后我想添加另外两列:

%看跌=看跌/总和* 100

%看涨=看涨/总和* 100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
group_df = group_df.unstack()

                    message        
sentiment           Bearish Bullish
created                            
2017-08-01 23:00:00     2.0     2.0
2017-08-02 00:00:00     1.0     3.0
2017-08-02 01:00:00     NaN     4.0

1 个答案:

答案 0 :(得分:1)

您可以将concat与新DataFrame

一起使用
idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T')
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3],
                   'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx)
print (df)
                     message sentiment
2017-08-01 23:13:00        1   Bearish
2017-08-01 23:25:00        1   Bearish
2017-08-01 23:37:00        2   Bearish
2017-08-01 23:49:00        2   Bearish
2017-08-02 00:01:00        2   Bearish
2017-08-02 00:13:00        2   Bullish
2017-08-02 00:25:00        2   Bullish
2017-08-02 00:37:00        2   Bullish
2017-08-02 00:49:00        3   Bullish
2017-08-02 01:01:00        3   Bullish
2017-08-02 01:13:00        3   Bullish
2017-08-02 01:25:00        3   Bullish
group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
#add ['message'] for remove Multiindex in columns
group_df = group_df['message'].unstack()

#divide by sum
#add prefix - https://stackoverflow.com/q/45453508/2901002 
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143

如果需要GroupBy.size

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size()
group_df = group_df.unstack()

df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143

What is the difference between size and count in pandas?