根据R中另一个数据帧的条件对data.frame中的列求和

时间:2014-03-31 22:46:45

标签: r apply

我有两个数据框ab

对于b中的每一行,我想查找start,enda范围内的start,end内的所有b,然后求和start,end的此特定子集的a,并将其存储为b中的新列。我正在使用for循环,但在R中使用apply有更有效的方法吗?

# data.frame a  
a <- data.frame(chrom=1L, start=as.integer(c(2,4,7,11)), end=as.integer(c(3,6,9,15)))
# chrom start end  
#     1     2   3  
#     1     4   6  
#     1     7   9        
#     1    11  15  

# data.frame b  
b <- data.frame(chr=1L, start=as.integer(c(2,11)), end=as.integer(c(10,20)))
# chrom start end  
#     1     2  10  
#     1    11  20  

# code
result=c()
for (i in 1:dim(b)[1]) { 
    # find start,end in A that are within    
    a_subset = a[which(a$chrom == b[i, ]$chrom & 
                 a$start >= b[i, ]$start & 
                 a$end <= b[i, ]$end), ]

    result = append(result, sum(a_subset$end - a_subset$start))  
}
c = cbind(b, result)

# data.frame c
# chrom start end result
#     1     2  10      5
#     1    11  20      4

1 个答案:

答案 0 :(得分:3)

使用sqldf很容易,基础R很烦恼:

R>require(sqldf)
R>b$id <- 1:nrow(b)
R>sqldf("select id, b.chr, sum(a.end - a.start) as diff 
    from a, b where a.start >= b.start and b.end >= a.end group by id")
  id chr diff
1  1   1    5
2  2   1    4
相关问题