计算订购商品的数量

时间:2014-11-11 11:46:45

标签: r

数据:

DB1 <- data.frame(orderItemID  = c(1,2,3,4,5,6,7,8,9,10),     
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","2.1.12","2.1.12","2.1.12"),  
itemID = c(2,3,2,5,12,4,2,3,1,5),   
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1))

预期结果:

Numberoforderedproductstotal = c(5, 3, 2, 5, 5, 2, 3, 5, 3, 2) 
Numberoforderedproductslastorder = c(2, 1, 2, 2, 2, 2, 1, 1, 2, 2)
Numberoforderedproductsaverage = c(2.5 , 1.5, 2, 2.5, 2.5, 2, 1.5, 1.5, 2.5, 2.5)
嘿伙计们, 我又一次遇到了一个我无法解决的问题: 在数据集中,我有相同大小或相同颜色的项目,相同的ItemID。每个注册用户都有自己唯一的customerID。

我想识别(统计)每个用户订购的文章数量:
1.总计到现在((总计所有订购物品的数量)
2.在最后一个订单(总结每个用户的最后订单的所有订购商品的数量[今天的日期例如是15.1.12])
3.订单总数平均订单数量 我还想将结果添加为现有数据集中的新列...

我已经尝试了“计数”和“计数”功能 - 还有“countrep”和聚合:但它们都没有正常工作......

我忘记了我还想要第四列的订单数量!
预期产出然后:

numberoforders: c(2, 2, 1, 2, 2, 1, 2, 2, 2, 2)

非常感谢您的支持!

2 个答案:

答案 0 :(得分:0)

好的,以下代码似乎可以实现您想要的输出

library(data.table)
setDT(DB1)[, orderDate := as.Date(orderDate, format = "%d.%m.%y")]
DB1[, `:=`(Numberoforderedproductstotal = .N,
           Numberoforderedproductslastorder = length(itemID[orderDate == max(orderDate)]),
           Numberoforderedproductsaverage = .N/length(unique(orderDate)),
           Numberoforders = length(unique(orderDate))), 
    by = customerID][]

#     orderItemID  orderDate itemID customerID Numberoforderedproductstotal Numberoforderedproductslastorder Numberoforderedproductsaverage Numberoforders
#  1:           1 2012-01-01      2          1                            5                                2                            2.5              2
#  2:           2 2012-01-01      3          2                            3                                1                            1.5              2
#  3:           3 2012-01-01      2          3                            2                                2                            2.0              1
#  4:           4 2012-01-01      5          1                            5                                2                            2.5              2
#  5:           5 2012-01-01     12          1                            5                                2                            2.5              2
#  6:           6 2012-01-01      4          3                            2                                2                            2.0              1
#  7:           7 2012-01-01      2          2                            3                                1                            1.5              2
#  8:           8 2012-01-02      3          2                            3                                1                            1.5              2
#  9:           9 2012-01-02      1          1                            5                                2                            2.5              2
# 10:          10 2012-01-02      5          1                            5                                2                            2.5              2

答案 1 :(得分:0)

您可以尝试使用ave

中的base R
with(DB1, ave(customerID, customerID, FUN=length))
# [1] 5 3 2 5 5 2 3 3 5 5
DB2 <- transform(DB1, date=as.Date(orderDate, '%d.%m.%Y'))
with(DB2, ave(as.numeric(date), customerID, FUN=function(x) sum(x ==max(x))))
 #[1] 2 1 2 2 2 2 1 1 2 2

with(DB2, ave(as.numeric(date), customerID,
         FUN=function(x) sum(table(x))/length(unique(x))))
# [1] 2.5 1.5 2.0 2.5 2.5 2.0 1.5 1.5 2.5 2.5

或者使用dplyr(来自@David Arenburg的评论n_distinct

library(dplyr)
res <- DB1%>% 
            group_by(customerID) %>% 
            mutate(orderDate=as.Date(orderDate, '%d.%m.%Y'), 
              Numberoforderedproductstotal=n(), 
              Numberoforderedproductslastorder= sum(orderDate==max(orderDate)), 
              Numberoforderedproductsaverage=n()/n_distinct(orderDate),
              Numberoforders= n_distinct(orderDate))

  as.data.frame(res)[-(1:4)]
 #   Numberoforderedproductstotal Numberoforderedproductslastorder
 #1                             5                                2
 #2                             3                                1
 #3                             2                                2
 #4                             5                                2
 #5                             5                                2
 #6                             2                                2
 #7                             3                                1
 #8                             3                                1
 #9                             5                                2
 #10                            5                                2
#    Numberoforderedproductsaverage Numberoforders
#1                             2.5              2
#2                             1.5              2
#3                             2.0              1
#4                             2.5              2
#5                             2.5              2
#6                             2.0              1
#7                             1.5              2
#8                             1.5              2
#9                             2.5              2
#10                            2.5              2