如何在数据帧R中使用group by计数和计数

时间:2018-06-07 08:20:42

标签: r dataframe dplyr tidyr tidyverse

我有下面提到的数据框:

Date         ID
2018-04-01   K-1
2018-04-01   K-1
2018-04-02   K-2
2018-04-02   K-2
2018-04-03   K-2
2018-04-04   K-3
2018-05-01   K-5
2018-05-01   K-5
2018-05-02   K-6
2018-05-02   K-7

通过使用上面的datafram我想要下面提到的两个矩阵,按日期分组:

New_DF1

Date        Unique Count    Duplicate_Count
2018-04-01  1               1
2018-04-02  1               1
2018-04-03  1               0
2018-04-04  1               0
2018-05-01  1               0
2018-05-02  2               0

New_DF2

Month     Unique Count    Duplicate_Count
May-18    4               2
Apr-18    3               0

我试过了:

DF%>%
        group_by(Date) %>%
        summarise(count = n_distinct(ID))

但它无法发挥作用。

2 个答案:

答案 0 :(得分:0)

怎么样:

DF%>%
        group_by(Date, ID) %>%
        summarise(Unique_Count  = n_distinct(ID),
                  Duplicate_Count = n())

答案 1 :(得分:0)

dplyr

library(dplyr)
New_DF1 <- DF %>%
  group_by(Date) %>%
  summarise(Unique_Count  = n_distinct(ID),
            Duplicate_Count = sum(table(ID)>1))

New_DF1
# # A tibble: 6 x 3
#         Date Unique_Count Duplicate_Count
#       <fctr>        <int>           <int>
# 1 2018-04-01            1               1
# 2 2018-04-02            1               1
# 3 2018-04-03            1               0
# 4 2018-04-04            1               0
# 5 2018-05-01            1               1
# 6 2018-05-02            2               0

New_DF2 <- New_DF1 %>%
  group_by(month = format.Date(Date, "%b-%y")) %>%
  summarize_at(2:3,sum)

New_DF2
# A tibble: 2 x 3
#    month Unique_Count Duplicate_Count
#    <chr>        <int>           <int>
# 1 Apr-18            4               2
# 2 May-18            3               1

使用基座R

New_DF1<- aggregate(ID ~ Date, DF, function(x) c(Unique_Count  = length(unique(x)),
                           Duplicate_Count = sum(table(x)>1)))

New_DF1<- cbind(New_DF1[1],New_DF1[[2]])

New_DF1
#         Date Unique_Count Duplicate_Count
# 1 2018-04-01            1               1
# 2 2018-04-02            1               1
# 3 2018-04-03            1               0
# 4 2018-04-04            1               0
# 5 2018-05-01            1               1
# 6 2018-05-02            2               0

New_DF2 <- New_DF1
New_DF2$month = format.Date(New_DF2$Date, "%b-%y")
New_DF2 <- aggregate(cbind(Unique_Count, Duplicate_Count) ~ month, New_DF2, sum)

New_DF2
#    month Unique_Count Duplicate_Count
# 1 Apr-18            4               2
# 2 May-18            3               1