根据id为数据帧提供分数

时间:2018-04-25 21:20:42

标签: python pandas

我有一个按日期索引的数据框,我试图根据类别为每个accountid提供分数,如果索引日期存在该类别值,则此数据框将如下所示。

     accountid category Smooth Hard Sharp Narrow
timestamp                                             
2018-03-29       101   Smooth    1  NaN   NaN    NaN
2018-03-29       102     Hard    NaN  1   NaN    NaN
2018-03-30       103   Narrow    NaN  NaN   NaN    1
2018-04-30       104    Sharp    NaN  NaN   1    NaN
2018-04-21       105   Narrow    NaN  NaN   NaN    1

每个accountid循环数据框的最佳方法是什么,并为每个未被堆叠的类别分配分数。

这是数据框创建脚本。

import pandas as pd
import datetime
idx = pd.date_range('02-28-2018', '04-29-2018')

df = pd.DataFrame(
    [[ '101', '2018-03-29', 'Smooth','NaN','NaN','NaN','NaN'], [
         '102', '2018-03-29', 'Hard','NaN','NaN','NaN','NaN'
    ], [ '103', '2018-03-30', 'Narrow','NaN','NaN','NaN','NaN'], [
         '104', '2018-04-30', 'Sharp','NaN','NaN','NaN','NaN'
    ], [ '105', '2018-04-21', 'Narrow','NaN','NaN','NaN','NaN']],
    columns=[ 'accountid', 'timestamp', 'category','Smooth','Hard','Sharp','Narrow'])

df['timestamp'] = pd.to_datetime(df['timestamp'])
df=df.set_index(['timestamp'])
print(df)

1 个答案:

答案 0 :(得分:0)

您可以将str访问者与get_dummies

一起使用
df[['accountid','category']].assign(**df['category'].str.get_dummies())

输出:

           accountid category  Hard  Narrow  Sharp  Smooth
timestamp                                                 
2018-03-29       101   Smooth     0       0      0       1
2018-03-29       102     Hard     1       0      0       0
2018-03-30       103   Narrow     0       1      0       0
2018-04-30       104    Sharp     0       0      1       0
2018-04-21       105   Narrow     0       1      0       0

用nan替换0,

df[['accountid','category']].assign(**df['category'].str.get_dummies())\
                            .replace(0,np.nan)

输出:

           accountid category  Hard  Narrow  Sharp  Smooth
timestamp                                                 
2018-03-29       101   Smooth   NaN     NaN    NaN     1.0
2018-03-29       102     Hard   1.0     NaN    NaN     NaN
2018-03-30       103   Narrow   NaN     1.0    NaN     NaN
2018-04-30       104    Sharp   NaN     NaN    1.0     NaN
2018-04-21       105   Narrow   NaN     1.0    NaN     NaN