如何根据另一列查找最小值和最大值?

时间:2019-08-28 06:56:16

标签: r

我的输入表如下,

    +------+------------------+
    | Name |     Datetime     | 
    +------+------------------+
    | ABC  |  26-01-2019 4:55 |  
    | ABC  |  26-01-2019 4:35 |  
    | ABC  |  26-01-2019 5:00 |  
    | XYZ  |  26-01-2019 2:50 |  
    | XYZ  |  26-01-2019 4:00 |  
    | XYZ  |  26-01-2019 4:59 | 
    +------+------------------+ 

从上表中,我想基于“名称”查找“ DateTime”的最小值和最大值,同时拒绝在“ DataTime”信息之间,并自动创建另一列(如果该人早晚使用R Studio被接纳)如下所示,

    +------+------------------+--------+
    | Name |     Datetime     |  Col3  |  
    +------+------------------+--------+
    | ABC  |  26-01-2019 4:35 |  Early |  
    | ABC  |  26-01-2019 5:00 |  Late  |  
    | XYZ  |  26-01-2019 2:50 |  Early |  
    | XYZ  |  26-01-2019 4:59 |  Late  |  
    +------+------------------+--------+

谢谢。

3 个答案:

答案 0 :(得分:0)

使用dplyr,一种方法是将DateTime的列转换为POSIXct的{​​{1}},arrange并选择第一行和最后一行(最小和最大) )添加到每个组中。

Datetime

答案 1 :(得分:0)

这是基本的R选项,

transform(stack(data.frame(
          do.call(cbind, 
              tapply(as.POSIXct(dd$Datetime, format = '%d-%m-%Y %H:%M'), dd$Name, function(i)
                  as.character(c(min(i), max(i))))), stringsAsFactors = FALSE)), 
         col3 = c('Early', 'Late'))

#               values ind  col3
#1 2019-01-26 04:35:00 ABC Early
#2 2019-01-26 05:00:00 ABC  Late
#3 2019-01-26 02:50:00 XYZ Early
#4 2019-01-26 04:59:00 XYZ  Late

答案 2 :(得分:0)

我们可以使用tidyverse

library(tidyverse)
df %>%
     arrange(dmy_hm(Datetime)) %>%
     group_by(Name) %>%
     filter(row_number() %in% c(1, n())) %>%
     mutate(Col3 = c("Early", "Late"))
# A tibble: 4 x 3
# Groups:   Name [2]
#  Name  Datetime        Col3 
#  <chr> <chr>           <chr>
#1 XYZ   26-01-2019 2:50 Early
#2 ABC   26-01-2019 4:35 Early
#3 XYZ   26-01-2019 4:59 Late 
#4 ABC   26-01-2019 5:00 Late 

数据

df <- structure(list(Name = c("ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"
), Datetime = c("26-01-2019 4:55", "26-01-2019 4:35", "26-01-2019 5:00", 
"26-01-2019 2:50", "26-01-2019 4:00", "26-01-2019 4:59")),
class = "data.frame", row.names = c(NA, 
-6L))