dplyr - 在同一列中连续出现,标签取决于出现次数

时间:2018-05-03 17:27:02

标签: r dplyr data.table

我想使用R&#39 dplyrdata.table来计算同一列(Temperature),中连续出现的次数标记连续出现> 3"丢弃"。

2 个答案:

答案 0 :(得分:4)

使用data.table

library(data.table)
setDT(df)
df[, 
    Comment := ifelse(seq_len(.N) <= 3, 'OK', 'Discard'), 
    .(Store, RTU, rleid(Temperature))
][]
#    Time Store RTU Temperature Comment
# 1:    1  1000   1          54      OK
# 2:    2  1000   1          54      OK
# 3:    3  1000   1          54      OK
# 4:    4  1000   1          54 Discard
# 5:    5  1000   1          54 Discard
# 6:    6  1000   1          56      OK
# 7:    7  1000   1          57      OK
# 8:    8  1000   1          50      OK
# 9:    9  1000   1          50      OK
#10:   10  1000   1          50      OK
#11:   11  1000   1          50 Discard
#12:   12  1000   1          50 Discard
#13:   13  1000   1          61      OK
#14:   14  1000   1          61      OK
#15:   15  1000   1          61      OK
#16:   16  1000   1          61 Discard
#17:   17  1000   1          61 Discard
#18:   18  1000   1          58      OK

答案 1 :(得分:1)

扩展OP解决方案以使用dplyrdata.table选项可以如下:

library(dplyr)
library(data.table)

df %>% group_by(Store,RTU) %>% mutate(Flag = rleid(Temperature)) %>%
  group_by(Flag) %>%
  mutate(Flag_Temperature_check = ifelse(row_number() <= 3, "Ok","Discard"))

# # A tibble: 18 x 6
# # Groups: Flag [6]
# Time Store   RTU Temperature  Flag Flag_Temperature_check
# <int> <int> <int>       <int> <int> <chr>                 
# 1     1  1000     1          54     1 Ok                    
# 2     2  1000     1          54     1 Ok                    
# 3     3  1000     1          54     1 Ok                    
# 4     4  1000     1          54     1 Discard               
# 5     5  1000     1          54     1 Discard               
# 6     6  1000     1          56     2 Ok                    
# 7     7  1000     1          57     3 Ok                    
# 8     8  1000     1          50     4 Ok                    
# 9     9  1000     1          50     4 Ok                    
# 10    10  1000     1          50     4 Ok                    
# 11    11  1000     1          50     4 Discard               
# 12    12  1000     1          50     4 Discard               
# 13    13  1000     1          61     5 Ok                    
# 14    14  1000     1          61     5 Ok                    
# 15    15  1000     1          61     5 Ok                    
# 16    16  1000     1          61     5 Discard               
# 17    17  1000     1          61     5 Discard               
# 18    18  1000     1          58     6 Ok