Question

order_id customer_id extension_of quantity cost duration
1      123      srujan           NA        1  100       30
2      456        teja           NA        1  100       30
3      789      srujan          123        1  100       30

我有包含订单信息的示例数据。如果order_id和extension_of columns的值匹配，我需要做的是汇总数据（费用总和等）。

Answer 1

我认为你要添加第1行和第3行，因为＃1和父订单一样，＃3是它的扩展名。这是您可以采用的一种方式（假设dft是您的数据框）：

library(purrr)
dft$parent_id <- map2_dbl(dft$order_id, dft$extension_of, function(order_id, extension_of) if(is.na(extension_of)) order_id else extension_of)
aggregate(cost~parent_id, data=dft, FUN=sum)

这可以不进行递归（即扩展本身已扩展），但它总结为一个级别。

如果您需要多个级别，可以选择以下内容：

library(purrr)
find_root <- function(order_id, extension_of){
  if(is.na(extension_of)){
    return(order_id)
  }else{
    parent_ext <- dft$extension_of[dft$order_id==extension_of]
    return(find_root(extension_of, parent_ext))
  }
}
dft$parent_id <- map2_dbl(dft$order_id, dft$extension_of, find_root)
aggregate(cost~parent_id, data=dft, FUN=sum) # same as before

（如果有必要，它不会查找父母的父母）。如果真的需要，人们当然可以优化性能;然而，原则将保持不变。

根据r中的条件聚合不同列中的数据

1 个答案: