灵敏度与正预测值 - 哪个最好?

时间:2018-06-11 10:46:09

标签: statistics classification regression ensemble-learning

我正在尝试在类不平衡数据集上构建模型(二进制 - 1' s:25%和0' s 75%)。尝试使用分类算法和集合技术。我对以下两个概念感到有点困惑,因为我更感兴趣的是预测更多1个。

1. Should i give preference to Sensitivity or Positive Predicted Value. 
Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value.
And some give 62% of Positive Predicted Value and low Sensitivity.


2. My dataset has around 450K observations and 250 features. 
After power test i took 10K observations by Simple random sampling. While selecting 
variable importance using ensemble technique's the features 
are different compared to the features when i tried with 150K observations. 
Now with my intuition and domain knowledge i felt features that came up as important in 
150K observation sample are more relevant. what is the best practice?

3. Last, can i use the variable importance generated by RF in other ensemple 
techniques to predict the accuracy?

你可以帮我解决一下,因为有点困惑吗

1 个答案:

答案 0 :(得分:1)

敏感度和积极预测值之间的偏好取决于您的分析的最终目标。这两个值之间的差异在这里得到了很好的解释:https://onlinecourses.science.psu.edu/stat507/node/71/ 总而言之,这两个衡量两个不同观点的结果。灵敏度为您提供测试在您拥有它的人中找到“条件”的概率。积极预测值着眼于正在测试的人中“病情”的普遍程度。

准确度取决于您的分类结果:它被定义为(真阳性+真阴性)/(总),而不是由RF产生的变量重要性。

此外,可以补偿数据集中的不平衡,请参阅https://stats.stackexchange.com/questions/264798/random-forest-unbalanced-dataset-for-training-test

相关问题