如何根据另一列的时间向pandas数据帧添加列

时间:2017-02-02 03:12:20

标签: python pandas

我正在尝试根据我选择的时间段向pandas dataframe添加一列,插入MorningEveningAfternoon

我正在尝试的代码如下:

df_agg['timeOfDay'] = df_agg.apply(lambda _: '', axis=1)
for i in range (len(df_agg)):
        if df_agg['time_stamp'].iloc[i][0].hour < 12:
            df_agg['timeOfDay'].iloc[i] = 'Morning'
        elif df_agg['time_stamp'].iloc[i][0].hour < 17 & df_agg['time_stamp'].iloc[i][0].hour > 12:
            df_agg['timeOfDay'].iloc[i] = 'Afternoon'
        else:
             df_agg['timeOfDay'].iloc[i] = 'Evening'

当我返回df_agg时,它会返回一个空的timeOfDay列。当试图根据一天中的时间将这些元素插入到行中时,是否有人知道我做错了什么?

2 个答案:

答案 0 :(得分:2)

<强> pandas
使用pd.cut来打破垃圾箱并给出标签。这种方法使创建更细粒度的时隙变得微不足道

df_agg.assign(
    timeOfDay=pd.cut(
        df_agg.time_stamp.dt.hour,
        [-1, 12, 17, 24],
        labels=['Morning', 'Afternoon', 'Evening']))

<强> numpy
使用searchsorted

hours = df_agg.time_stamp.dt.hour.values
times = np.array(['Morning', 'Afternoon', 'Evening'])

df_agg.assign(timeOfDay=times[np.array([12, 17]).searchsorted(hours)])

均为

enter image description here

时间测试
小数据集

enter image description here

大数据集

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=10000, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  

enter image description here

<强> 设置
借用@ jezrael的设置df_agg

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  
print (df_agg)

答案 1 :(得分:1)

我认为您可以使用双numpy.where,请检查是否有必要将<更改为<=>更改为>=

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(12)})  
print (df_agg)
     a          time_stamp
0    0 2015-02-24 10:00:00
1    1 2015-02-24 11:00:00
2    2 2015-02-24 12:00:00
3    3 2015-02-24 13:00:00
4    4 2015-02-24 14:00:00
5    5 2015-02-24 15:00:00
6    6 2015-02-24 16:00:00
7    7 2015-02-24 17:00:00
8    8 2015-02-24 18:00:00
9    9 2015-02-24 19:00:00
10  10 2015-02-24 20:00:00
11  11 2015-02-24 21:00:00
hours = df_agg.time_stamp.dt.hour.values
df_agg['timeOfDay'] = np.where(hours <= 12, 'Morning', 
                      np.where(hours >= 17, 'Evening', 'Afternoon'))

     a          time_stamp  timeOfDay
0    0 2015-02-24 10:00:00    Morning
1    1 2015-02-24 11:00:00    Morning
2    2 2015-02-24 12:00:00    Morning
3    3 2015-02-24 13:00:00  Afternoon
4    4 2015-02-24 14:00:00  Afternoon
5    5 2015-02-24 15:00:00  Afternoon
6    6 2015-02-24 16:00:00  Afternoon
7    7 2015-02-24 17:00:00    Evening
8    8 2015-02-24 18:00:00    Evening
9    9 2015-02-24 19:00:00    Evening
10  10 2015-02-24 20:00:00    Evening
11  11 2015-02-24 21:00:00    Evening
相关问题