我有一个df栏"天" 1000行记录。
如果天数小于7.0天(0-7)组为" 1-6天"
如果天数大于7.1但小于14.0天(7.1 - 14.0)组为" 7-14天"
如果超过或等于15天的日期分组为"> 14天"
如何创建新列" Days_Group"代表日子分组?
e.g of days values:
1 3.0
2 4.6
3 14.9
4 7.1
5 15.1
6 109
答案 0 :(得分:2)
np.searchsorted
labels = np.array(['1-6 days', '7-14 days', '>14 days'])
bins = np.array([7, 14])
df.assign(Day_Group=labels[bins.searchsorted(df.days)])
days Day_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 >14 days
4 7.1 7-14 days
5 15.1 >14 days
6 109.0 >14 days
答案 1 :(得分:1)
使用pd.cut
df.assign(Day_Group=pd.cut(df['Days'],
[0,7,14,np.inf],
labels=['1-6 days','7-14 days','> 14 days']))
输出:
Days Day_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
答案 2 :(得分:1)
我认为需要cut
:
import numpy as np
df['Days_Group'] = pd.cut(df['days'],
bins=[0,7,14,np.inf],
labels=['1-6 days','7-14 days','> 14 days'],
include_lowest=True)
print (df)
days Days_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
df['Days_Group'] = pd.cut(df['days'],
bins=[0,7,14, pd.np.inf],
labels=['1-6 days','7-14 days','> 14 days'],
include_lowest=True)
print (df)
days Days_Group
1 3.0 1-6 days
2 4.6 1-6 days
3 14.9 > 14 days
4 7.1 7-14 days
5 15.1 > 14 days
6 109.0 > 14 days
编辑:如果days
中的timedeltas:
print (df)
days
1 3 days 00:00:00
2 4 days 14:24:00
3 14 days 21:36:00
4 7 days 02:24:00
5 15 days 02:24:00
6 109 days 00:00:00
df['days'] = df['days'].dt.total_seconds() / 24 / 3600
print (df)
days
1 3.0
2 4.6
3 14.9
4 7.1
5 15.1
6 109.0