Question

我正在开发一个基于随机森林的项目。我看到一个ppt（Rec08_Oct21.ppt）（www.cs.cmu.edu/~ggordon/10601 /.../ rec08 / Rec08_Oct21.ppt）关于随机森林创造。我想问一个问题。扫描随机选择的特征及其信息增益值后，我们选择特征j的IG最大值的特征。那么，我们如何使用这些信息进行拆分？我们如何在此之后继续？

Answer 1

LearnTree(X,Y)

让X为R x M矩阵，R-datapoints和M-attributes，Y为R元素，其中包含每个数据点的输出类。

j* = *argmaxj* **IG** j //(this is the splitting attribute we'll use)

IG的最大值可以来自分类（基于文本）或实数（基于数字）属性。

---＆GT;如果它来自分类属性（j）：对于第j个属性中的每个值v，我们将定义一个新矩阵，现在采用 X v和 Y v作为输入派生一个子树。

Xv=subset of all the rows of X in which Xij = v;
Yv = corresponding subset of Y values;
Child v = LearnTree(Xv,Yv );

PS：子树的数量与第j个属性中唯一值的数量相同

---＆gt;如果它来自真实值属性（j）：我们需要找到最好的分裂thershold

PS：thershold值 t 与为该属性提供max IG 值的值相同

define IG(Y|X:t) as H(Y) - H(Y|X:t)
define H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t)
define IG*(Y|X) = maxt IG(Y|X:t)

我们将分割这个 t 值，然后我们通过定义两个新的 X t和 Y t来定义两个ChildTrees

X_lo = subset of all the rows whose Xij < t
Y_lo = corresponding subset Y values
Child_lo = LearnTree(X_lo,Y_lo)

X_hi = subset of all the rows whose Xij >t
Y_hi = corresponding subset Y values
Child_hi = LearnTree(X_hi,Y_hi)

分割完成后，

然后分类数据

了解更多信息go here！

希望我回答你的问题。

随机森林查询

1 个答案: