Question

我已经把问题挖了3天了，所以最后有勇气在这里问。我有一个379,584个条目的数据集，我想将它提供给＆＃34; arules＆＃34;在R

看起来像这样 Stucture is the following A.如果我尝试使用格式=＆＃34; basket＆＃34;，我会执行以下操作

"span_or": {
                            clauses: [
                            {
                                "span_multi": {
                                    "match": {
                                        "regexp": {
                                            "message": "путин.*"
                                        }
                                    }
                                }
                            }
                            ,
                            {
                                "span_multi": {
                                    "match": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term" : { "message" : "test" }
                                                },
                                                {
                                                    "term" : { "message" : "rrr" }
                                                }
                                            ]
                                        }
                                    }
                                }
                            }
                            ]
                        }

这给了我一个错误＆＃34;无法强制列出包含重复项目的交易＆＃34;

B中。如果我使用格式=＆＃34;单＆＃34;

sales <- read.csv("sales.csv", sep=";")
s1 <- split(sales$product_id, sales$order_id)
s1 <- unique(s1)

tr <- as(s1, "transactions")

我有同样的错误＆＃34;无法强制列出包含重复项目的交易＆＃34;

我已经检查了文件是否有重复项，Excel无法找到任何文件。我相信麻烦是微不足道的，但我只是被卡住了。

Answer 1

显然，唯一（s1）会给您的编码带来一些问题。需要吗？

我设法通过散列该行来创建交易。

sales <- structure(list(sku = c(207426L, 207422L, 207424L, 9793L, 33186L, 
72406L), product_id = c(15729L, 15725L, 15727L, 15999L, 15983L, 
15992L), item_id = 1:6, order_id = c(1L, 1L, 1L, 2L, 2L, 2L)), 
.Names = c("sku", "product_id", "item_id", "order_id"), 
class = "data.frame", row.names = c(NA, -6L))

s1 <- split(sales$product_id, sales$order_id)
#s1 <- unique(s1)

tr <- as(s1, "transactions")
tr

transactions in sparse format with
 2 transactions (rows) and
 6 items (columns)

如果确实需要唯一，请改为运行：

s1 <- lapply(s1, unique)

如何为arules准备交易数据

1 个答案: