熊猫连续约会

时间:2018-06-04 19:11:30

标签: python pandas

我的数据如下:

ID  Date_off    Approved
1   3/7/2018    1
1   3/8/2018    1
2   3/29/2018   1
2   3/30/2018   1
3   3/30/2018   1
4   4/2/2018    1
5   4/2/2018    1
6   4/2/2018    1
7   4/2/2018    1
8   4/2/2018    1
4   4/3/2018    1
5   4/3/2018    1
6   4/3/2018    1
7   4/3/2018    1
8   4/3/2018    1
4   4/4/2018    1
9   4/4/2018    1
5   4/4/2018    1
10  4/4/2018    1
6   4/4/2018    1
8   4/4/2018    1
9   4/5/2018    1
11  4/5/2018    1
5   4/5/2018    1
10  4/5/2018    1
6   4/5/2018    1
12  4/5/2018    1
8   4/5/2018    1
13  4/10/2018   1
14  4/10/2018   1
15  4/10/2018   1
16  4/10/2018   1
17  4/11/2018   1
13  4/11/2018   1
15  4/11/2018   1
18  4/11/2018   1
16  4/11/2018   1
15  4/12/2018   1
16  4/12/2018   1
16  4/13/2018   1
19  4/16/2018   1
19  4/17/2018   1
20  4/17/2018   1
21  4/18/2018   1
19  4/18/2018   1
20  4/18/2018   1
21  4/19/2018   1
19  4/19/2018   1
20  4/19/2018   1
22  4/20/2018   1
21  4/20/2018   1
20  4/20/2018   1
22  4/23/2018   1
23  4/23/2018   1
22  4/24/2018   1
23  4/24/2018   1
22  4/25/2018   1
23  4/25/2018   1
22  4/26/2018   1
23  4/26/2018   1
14  4/26/2018   1
22  4/27/2018   1
10  4/27/2018   1
23  4/27/2018   1
5   4/27/2018   1
14  4/27/2018   1
8   4/30/2018   1
1   4/30/2018   1
10  4/30/2018   1
23  4/30/2018   1
11  4/30/2018   1
24  4/30/2018   1
25  4/30/2018   1
14  4/30/2018   1

列出了某人度假的所有日子。我需要按姓名分组,并返回有人在度假,第一天和最后一天的连续日数。 所以我在寻找:

Name   Days  From         To
Bob     2    2017-04-06   2017-04-07
Jimy    2    2017-04-07   2017-04-08
Jimy    1    2017-04-10   2017-04-10
....

目前Scott的代码返回了这个:

    Approved    Date_off    Date_off
    sum amin    amax
ID          
1   3   3/7/2018    4/30/2018
2   2   3/29/2018   3/30/2018
3   1   3/30/2018   3/30/2018
4   1   4/2/2018    4/2/2018
4   2   4/3/2018    4/4/2018
5   1   4/2/2018    4/2/2018
5   1   4/3/2018    4/3/2018
5   3   4/4/2018    4/27/2018
6   1   4/2/2018    4/2/2018
6   1   4/3/2018    4/3/2018
6   2   4/4/2018    4/5/2018
7   2   4/2/2018    4/3/2018
8   1   4/2/2018    4/2/2018
8   1   4/3/2018    4/3/2018
8   3   4/4/2018    4/30/2018
9   2   4/4/2018    4/5/2018
10  4   4/4/2018    4/30/2018
11  2   4/5/2018    4/30/2018
12  1   4/5/2018    4/5/2018
13  2   4/10/2018   4/11/2018
14  1   4/10/2018   4/10/2018
14  3   4/26/2018   4/30/2018
15  1   4/10/2018   4/10/2018
15  2   4/11/2018   4/12/2018
16  1   4/10/2018   4/10/2018
16  1   4/11/2018   4/11/2018
16  2   4/12/2018   4/13/2018
17  1   4/11/2018   4/11/2018
18  1   4/11/2018   4/11/2018
19  1   4/16/2018   4/16/2018
19  1   4/17/2018   4/17/2018
19  2   4/18/2018   4/19/2018
20  1   4/17/2018   4/17/2018
20  1   4/18/2018   4/18/2018
20  2   4/19/2018   4/20/2018
21  1   4/18/2018   4/18/2018
21  2   4/19/2018   4/20/2018
22  1   4/20/2018   4/20/2018
22  1   4/23/2018   4/23/2018
22  1   4/24/2018   4/24/2018
22  1   4/25/2018   4/25/2018
22  2   4/26/2018   4/27/2018
23  1   4/23/2018   4/23/2018
23  1   4/24/2018   4/24/2018
23  1   4/25/2018   4/25/2018
23  3   4/26/2018   4/30/2018
24  1   4/30/2018   4/30/2018
25  1   4/30/2018   4/30/2018

我只改变了最后一行:

df.groupby(['ID','cons']).agg({'Approved':sum, 'Date_off':[np.min, np.max]}).reset_index(level=1,drop=True)

如果没有最小/最大日期列,则返回相同的结果。 工作日和周末可能导致问题?这应该算作星期一和星期五作为相邻天数减去的相邻天数。

1 个答案:

答案 0 :(得分:1)

更新

df['daysdiff'] = df.sort_values('Date').groupby(['Name'])['Date'].diff()

# I modified this line 
df['cons'] = (df.sort_values('Date_off').groupby('ID')['daysdiff']
                .apply(lambda x: (x.bfill().dt.days != 1).cumsum()))

print(df.groupby(['ID','cons'])['Approved','Date_off']
        .agg({'Approved':'sum','Date_off':['min','max']})
        .reset_index(level=1, drop=True))

输出:

   Approved   Date_off           
        sum        min        max
ID                               
1         2 2018-03-07 2018-03-08
1         1 2018-04-30 2018-04-30
2         2 2018-03-29 2018-03-30
3         1 2018-03-30 2018-03-30
4         3 2018-04-02 2018-04-04
5         4 2018-04-02 2018-04-05
5         1 2018-04-27 2018-04-27
6         4 2018-04-02 2018-04-05
7         2 2018-04-02 2018-04-03
8         4 2018-04-02 2018-04-05
8         1 2018-04-30 2018-04-30
9         2 2018-04-04 2018-04-05
10        2 2018-04-04 2018-04-05
10        1 2018-04-27 2018-04-27
10        1 2018-04-30 2018-04-30
11        1 2018-04-05 2018-04-05
11        1 2018-04-30 2018-04-30
12        1 2018-04-05 2018-04-05
13        2 2018-04-10 2018-04-11
14        1 2018-04-10 2018-04-10
14        2 2018-04-26 2018-04-27
14        1 2018-04-30 2018-04-30
15        3 2018-04-10 2018-04-12
16        4 2018-04-10 2018-04-13
17        1 2018-04-11 2018-04-11
18        1 2018-04-11 2018-04-11
19        4 2018-04-16 2018-04-19
20        4 2018-04-17 2018-04-20
21        3 2018-04-18 2018-04-20
22        1 2018-04-20 2018-04-20
22        5 2018-04-23 2018-04-27
23        5 2018-04-23 2018-04-27
23        1 2018-04-30 2018-04-30
24        1 2018-04-30 2018-04-30
25        1 2018-04-30 2018-04-30