pandas groupby为新专栏编号

时间:2018-03-21 12:44:22

标签: python pandas pandas-groupby

我有一个df栏"天" 1000行记录。

如果天数小于7.0天(0-7)组为" 1-6天"

如果天数大于7.1但小于14.0天(7.1 - 14.0)组为" 7-14天"

如果超过或等于15天的日期分组为"> 14天"

如何创建新列" Days_Group"代表日子分组?

e.g of days values:
1 3.0
2 4.6
3 14.9
4 7.1
5 15.1
6 109

3 个答案:

答案 0 :(得分:2)

np.searchsorted

labels = np.array(['1-6 days', '7-14 days', '>14 days'])
bins = np.array([7, 14])

df.assign(Day_Group=labels[bins.searchsorted(df.days)])

    days  Day_Group
1    3.0   1-6 days
2    4.6   1-6 days
3   14.9   >14 days
4    7.1  7-14 days
5   15.1   >14 days
6  109.0   >14 days

答案 1 :(得分:1)

使用pd.cut

df.assign(Day_Group=pd.cut(df['Days'],
                           [0,7,14,np.inf],
                           labels=['1-6 days','7-14 days','> 14 days']))

输出:

    Days  Day_Group
1    3.0   1-6 days
2    4.6   1-6 days
3   14.9  > 14 days
4    7.1  7-14 days
5   15.1  > 14 days
6  109.0  > 14 days

答案 2 :(得分:1)

我认为需要cut

import numpy as np

df['Days_Group'] = pd.cut(df['days'],
                          bins=[0,7,14,np.inf], 
                          labels=['1-6 days','7-14 days','> 14 days'],
                          include_lowest=True)
print (df)
    days Days_Group
1    3.0   1-6 days
2    4.6   1-6 days
3   14.9  > 14 days
4    7.1  7-14 days
5   15.1  > 14 days
6  109.0  > 14 days
df['Days_Group'] = pd.cut(df['days'],
                          bins=[0,7,14, pd.np.inf], 
                          labels=['1-6 days','7-14 days','> 14 days'],
                          include_lowest=True)
print (df)
    days Days_Group
1    3.0   1-6 days
2    4.6   1-6 days
3   14.9  > 14 days
4    7.1  7-14 days
5   15.1  > 14 days
6  109.0  > 14 days

编辑:如果days中的timedeltas:

print (df)
               days
1   3 days 00:00:00
2   4 days 14:24:00
3  14 days 21:36:00
4   7 days 02:24:00
5  15 days 02:24:00
6 109 days 00:00:00

df['days'] = df['days'].dt.total_seconds() / 24 / 3600
print (df)
    days
1    3.0
2    4.6
3   14.9
4    7.1
5   15.1
6  109.0