R-基于其他行和列

时间:2016-01-07 15:21:28

标签: r if-statement reference

我的数据格式如下: - 第一列:指示机器是否正在运行 - 第二列:机器运行的总时间

见下面的数据集:

structure(c("", "running", "running", "running", "", "", "", 
"running", "running", "", "10", "15", "30", "2", "5", "17", "47", 
"12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL, 
    c("c", "v")))

我想添加第三列,它给出机器运行的总时间(通过添加自机器开始运行以来的所有时间)。请参阅下面的所需输出:

 [1,] ""        "10" "0"   
 [2,] "running" "15" "15"  
 [3,] "running" "30" "45"  
 [4,] "running" "2"  "47"  
 [5,] ""        "5"  "0"   
 [6,] ""        "17" "0"   
 [7,] ""        "47" "0"   
 [8,] "running" "12" "12"  
 [9,] "running" "57" "69"  
[10,] ""        "87" "0" 

我尝试在R中编写一些代码以优雅的方式来实现这一点,但我的编程技巧目前来说太有限了。有没有人知道这个问题的解决方案?先谢谢你了!

3 个答案:

答案 0 :(得分:2)

首先,我们将您的数据转换为可以包含混合数据类型的更合适的数据结构:

m <- structure(c("", "running", "running", "running", "", "", "", 
                 "running", "running", "", "10", "15", "30", "2", "5", "17", "47", 
                 "12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL, 
                                                                        c("c", "v")))
DF <- as.data.frame(m, stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert, as.is = TRUE)

然后我们可以使用package data.table:

轻松完成
library(data.table)
setDT(DF)
DF[, total := cumsum(v), by = rleid(c)]
DF[c == "", total := 0]
#          c  v total
# 1:         10     0
# 2: running 15    15
# 3: running 30    45
# 4: running  2    47
# 5:          5     0
# 6:         17     0
# 7:         47     0
# 8: running 12    12
# 9: running 57    69
#10:         87     0

答案 1 :(得分:2)

这是一个使用基数R的简单解决方案:

DF$total <- ave(DF$v, DF$c, cumsum(DF$c == ""), FUN = cumsum)
DF$total[DF$c == ""] <- 0

> DF
         c  v total
1          10     0
2  running 15    15
3  running 30    45
4  running  2    47
5           5     0
6          17     0
7          47     0
8  running 12    12
9  running 57    69
10         87     0

答案 2 :(得分:1)

我们可以使用dplyr

library(dplyr)
 DF %>% 
   group_by(cumsum(c==''),c) %>%
   mutate(total=replace(cumsum(v), c=='', 0) )