R-按一个单元格的条件删除行

时间:2018-03-30 09:53:30

标签: r conditional-statements delete-row posixct

我真的很新R,我有一个问题需要解决。我有这样的数据框

str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   70128 obs. of  2 variables:
 $ date: POSIXct, format: "2009-01-01 00:00:00" "2009-01-01 01:00:00" "2009-01-01 02:00:00" "2009-01-01 03:00:00" ...
 $ value: num  -0.6 -0.7 -0.6 -0.4 -0.4 -0.3 -0.3 -0.3 -0.1 0 ...

所以我有我的Date列,它是POSIXct格式,步长为1小时。我的值列是数字,表示温度。

现在我想按条件删除整天。条件是,如果一天内只有一个单元格低于3(°C),我想删除那一天。

我搜索了一会儿,但我无法解决它。 希望你能帮助我。

提前谢谢

3 个答案:

答案 0 :(得分:2)

紧凑dplyr语法

library(dplyr)

#Building an example data frame
df <- data.frame(
datetime = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
                    "2009-01-01 02:00:00", "2009-01-01 03:00:00",
                    "2009-01-02 02:00:00", "2009-01-02 03:00:00", 
                    "2009-01-03 04:00:00", "2009-01-03 02:00:00", 
                    "2009-01-03 03:00:00", "2009-01-03 04:00:00",
                    "2009-01-04 03:00:00", "2009-01-04 04:00:00")),

temp = c(1, -0.7, -0.6,
         -0.4, -0.4, -0.3, 
         -0.3, 10, 4, 
         0, 10, 5))

#Query
df %>% 
    mutate(date = lubridate::as_date(datetime)) %>% 
    group_by(date) %>% 
    filter(all(temp > 3))

#Result
      datetime             temp date      
  <dttm>              <dbl> <date>    
1 2009-01-04 03:00:00   10. 2009-01-04
2 2009-01-04 04:00:00    5. 2009-01-04

答案 1 :(得分:1)

在编辑之前使用Pasqui的示例并略微修改它......

我选择围绕我的解释构建逻辑,当且仅当一天中只有一个单元/记录低于3ºC时,才能删除一天。因此,如果一天中有两个,三个或更多的细胞/记录低于3ºC,它将被保留。在这个例子中,2009年1月4日所有日期中只有一个单元/记录低于3ºC,所以它被删除了。

library(dplyr)

#Building an example data frame
df <- data.frame(
  date = as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
                      "2009-01-01 02:00:00", "2009-01-01 03:00:00",
                      "2009-01-01 04:00:00", "2009-01-01 05:00:00",
                      "2009-01-02 02:00:00", "2009-01-02 03:00:00", 
                      "2009-01-03 04:00:00", "2009-01-03 02:00:00", 
                      "2009-01-03 03:00:00", "2009-01-03 04:00:00",
                      "2009-01-04 00:00:00", "2009-01-04 01:00:00")),

  temp = c(1, -0.7, -0.6,
           -0.4, 3.5, 2.9, -0.4, -0.3, 
           -0.3, 10, 4, 
           0, 3.3, 2.5)

)

require(lubridate)

df2 <- df %>% 
  mutate(
    day = date(date),
    counter = 1
  ) %>%
  group_by(day) %>%
  filter(
    if (sum(counter[temp < 3]) == 1) {
      FALSE
    } else {
      TRUE
    }
  )

# A tibble: 12 x 4
# Groups:   day [3]
                  date  temp        day counter
                <dttm> <dbl>     <date>   <dbl>
 1 2009-01-01 00:00:00   1.0 2009-01-01       1
 2 2009-01-01 01:00:00  -0.7 2009-01-01       1
 3 2009-01-01 02:00:00  -0.6 2009-01-01       1
 4 2009-01-01 03:00:00  -0.4 2009-01-01       1
 5 2009-01-01 04:00:00   3.5 2009-01-01       1
 6 2009-01-01 05:00:00   2.9 2009-01-01       1
 7 2009-01-02 02:00:00  -0.4 2009-01-02       1
 8 2009-01-02 03:00:00  -0.3 2009-01-02       1
 9 2009-01-03 04:00:00  -0.3 2009-01-03       1
10 2009-01-03 02:00:00  10.0 2009-01-03       1
11 2009-01-03 03:00:00   4.0 2009-01-03       1
12 2009-01-03 04:00:00   0.0 2009-01-03       1

答案 2 :(得分:0)

尝试调整此代码:

玩具数据框(2009-01-01只有1小时,值<3,而2009-01-02无):

    df<-data.frame(date=c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-02 03:00:00"),
+ value=c(-0.6, 8, 4, 7))
df
                 date value
1 2009-01-01 00:00:00  -0.6
2 2009-01-01 01:00:00   8.0
3 2009-01-01 02:00:00   4.0
4 2009-01-02 03:00:00   7.0

确定要删除的日期

date_to_delete<-unique(as.Date(df[df[,"value"]<3,"date"], format="%Y-%m-%d"))

您的数据框已清除

df[!(as.Date(df$date,format="%Y-%m-%d") %in% date_to_delete),]
                 date value
4 2009-01-02 03:00:00     7
相关问题