根据记录R之间的差异选择记录

时间:2014-02-02 20:00:31

标签: r select conditional-statements records difference

我希望有人能为我提出这个“问题”的建议,因为我真的不知道如何继续...... 好吧,我的数据是这样的

data<-data.frame(site=c(rep("A",3),rep("B",3),rep("C",3)),time=c(100,180,245,5,55,130,70,120,160))

时间以分钟为单位。 我想只选择每个站点的差异大于60的记录,所以输出应该是这样的:

out<-data[c(1:4,6,7,9),]

到目前为止我尝试了什么。好吧,为了得到差异我用这个:

difference<-stack(tapply(data$time,data$site,diff))

然而,不知道如何拿起那些符合我条件的记录...... 如果已经有类似的问题,虽然我已经搜索了一段时间,但我为此道歉。 为了清楚地表明,差异的定义可能并不那么明确,我需要选择至少分开60分钟的所有记录(对于每个站点),这样不仅是那些严格及时的记录。 具体地,

> out
site time
1    A  100#included because difference between 2 and 1 is>60
2    A  180#included because difference between 3 and 2 is>60
3    A  245#included because separated by 6o minutes before record#2
4    B    5#included because difference between 6 and 4 is>60
6    B  130#included because separated by 6o minutes before record#4
7    C   70#included because difference between 9 and 7 is>60
9    C  160#included because separated by 60 minutes before record#7

可能是为了解决“问题”,考虑差异的结果可能是有用的,如下所示:

> difference
values ind
1     80   A#include record 1 and 2
2     65   A#include record 2 and 3
3     50   B#include only record 4
4     75   B#include record 6 because there are(50+75)>60 m from r#4
5     50   C#include only record 7
6     40   C#include record 9 because there are (50+40)>60 m from r#7

感谢您的帮助。

3 个答案:

答案 0 :(得分:3)

data[ave(data$time, data$site, FUN = function(x){c(61, diff(x)) > 60}) == 1, ]

#   site time
# 1    A  100
# 2    A  180
# 3    A  245
# 4    B    5
# 6    B  130
# 7    C   70
更新后的问题

修改

keep <- as.logical(ave(data$time, data$site, FUN = function(x){
  c(TRUE, cumsum(diff(x)) > 60)
}))

data[keep, ]

#   site time
# 1    A  100
# 2    A  180
# 3    A  245
# 4    B    5
# 6    B  130
# 7    C   70
# 9    C  160

答案 1 :(得分:1)

#Calculate the differences
data$diff <- unlist(by(data$time, data$site,function(x)c(NA,diff(x))))
#subset data
data[is.na(data$diff) | data$diff > 60,]

答案 2 :(得分:0)

使用plyr

ddply(dat,.(site),function(x)x[c(TRUE , diff(x$time) >60),])