partykit :: ctree randomness in majority = TRUE

时间:2016-09-01 02:04:57

标签: r party

我试图了解ctree如何适应/预测所有预测变量中完全缺失的观察结果。例如,

library(partykit)
airq <- subset(airquality, !is.na(Ozone))
airq <- rbind(airq,data.frame(Ozone=rnorm(50),Solar.R=NA,Wind=NA,Temp=NA,Month=NA,Day=NA))
airct <- ctree(Ozone ~ ., data = airq,control = ctree_control(majority = TRUE))
table(tail(predict(airct,type="node"),50))

airq的最后50行缺少所有预测变量,通过阅读文档,我得到的印象是majority=TRUE它只会跟随大多数,这意味着它们应该都是相同的节点完全没有变化。然而,我得到了他们的预测分布。

所以

  1. 是我对majority=TRUE如何正确理解的理解?
  2. ctree如何拟合/预测没有任何观察到的预测变量的行?
  3. 顺便说一下,我尝试跟踪代码以查看majority参数的使用方式,并看到#104中的partykit:::.cnode行:

    prob <- numeric(0) + 1L:length(prob) %in% which.max(prob)
    

    这对我来说很奇怪,因为结果总是numeric(0)

1 个答案:

答案 0 :(得分:1)

这是处理install.packages("partykit", repos = "http://R-Forge.R-project.org") 控制参数的错误。它最近已在R-Forge存储库中修复(请参阅PDO::query())但尚未发布到CRAN。运行后

{
    "_id": {
        "$oid": "57c910721e5197030038d8c5"
    },
    "firstname": "xyz",
    "schoolId": "ChIJI_40EALkDDkRs-2n_kQqPRA",
    "email": "test@outlook.com",
    "image_url": "cc7b0f23cea00964301ca225c6f3430a",
    "lastname": "abc",
    "rating": 1,
    "review": "Gr8",
    "voters": [
        "test@outlook.com"
    ],
    "time": {
        "$date": "2016-09-02T05:38:58.966Z"
    },
    "__v": 0
},

{
    "_id": {
        "$oid": "57c910721e5197030038d8c4"
    },
    "firstname": "xyz",
    "schoolId": "sffsdssfsfsfdssdss",
    "email": "test@gmail.com",
    "image_url": "cc7b0f23cea00964301ca225c6f3430a",
    "lastname": "abc",
    "rating": 1,
    "review": "Gr8",
    "voters": [
        "test@outlook.com,test@gmail.com"
    ],
    "time": {
        "$date": "2016-09-02T05:38:58.966Z"
    },
    "__v": 0
}

一切都应该按预期工作。我认为CRAN发布的日期尚未安排,但不应该在遥远的未来。