按任意时间间隔对时间索引的DataFrame进行分组

时间:2018-12-03 04:08:39

标签: python python-3.x pandas dataframe

我有一个dataframe df,如下所示:

Time    Student
9:29    Alex
9:32    Bob
9:34    Carrie
9:41    Donald
9:48    Elijah
9:49    Fred
9:53    George
10:02   Henry
10:07   Ian

还有一个列表list = [9:34, 9:41, 9:45, 9:57]

我想要的输出是有一个dataframe2看起来像这样

Time2         Students
< first time  Alex     Bob     Carrie
9:34          Donald
9:41    
9:45          Elijah   Fred    George
9:57          all other students

基本上,我使用list中的元素,将所有学生分组到bin中,每个bin [i]包含所有x in list[i] < x <= list[i+1]。同样,所有进入list中第一个元素和最后一个元素之后的学生都应放入dataframe2中所示的特殊垃圾箱中。

在此先感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

您可以使用pd.grouper

df['Time'] = pd.to_datetime(df['Time'])

df = df.groupby(pd.Grouper(key = 'Time', freq = '10Min'))['Student'].\
                                 apply(lambda x: list(x)).\
                                 reset_index()
df['Time'] = df['Time'].dt.time

输出:

       Time                 Student
0  09:20:00                  [Alex]
1  09:30:00           [Bob, Carrie]
2  09:40:00  [Donald, Elijah, Fred]
3  09:50:00                [George]
4  10:00:00            [Henry, Ian]

编辑:

如果您有不规则的时间间隔,例如您提供的间隔列表(list = [9:34, 9:41, 9:45, 9:57]),则可以使用以下方法。我个人并不知道更简洁的方法!

ls = ['9:34', '9:41', '9:45', '9:57']

## A "last-call" time for the day. Note that this method fails if any student features after this time (23:59:59):
ls.append('23:59:59')
ls = pd.DatetimeIndex(ls).time

df['Time'] = pd.to_datetime(df['Time']).dt.time

def idx_getter(t, ls):
    """
    Returns the right hand side of the interval the timestamp falls in.
    """
    return ls[sum(t > ls)]

df['time_grp'] = df['Time'].apply(lambda t: idx_getter(t, ls))
std_grps = pd.Series(ls).\
             map(df.groupby('time_grp')['Student'].apply(list))
std_grps.index = ls

std_grps

输出:

09:34:00       [Alex, Bob, Carrie]
09:41:00                  [Donald]
09:45:00                       NaN
09:57:00    [Elijah, Fred, George]
23:59:59              [Henry, Ian]

答案 1 :(得分:2)

您可以使用pd.cut

lst = ['9:34', '9:41', '9:45', '9:57']

breaks = [-np.inf, *(pd.to_datetime(lst)).astype(np.int64) // 10e9, np.inf]
labels = [f'<{lst[0]}', *lst]

v = pd.to_datetime(df['Time']).astype(np.int64) // 10e9
cats = pd.cut(v, bins=breaks, labels=labels, right=True)

df.groupby(cats).Student.agg(', '.join)

Time
<9:34       Alex, Bob, Carrie
9:34                   Donald
9:41                     None
9:45     Elijah, Fred, George
9:57               Henry, Ian
Name: Student, dtype: object