Question

我是RFC（随机森林分类器）的新手，我尝试完成一个相当简单的例子。

我有一个3列的训练数据集，如下所示：

Utm_term  | DayOfWeek  | Customers

我已经设置了RandomForest模型，使用词袋方法训练Query数据（以文本形式显示）。

但是，我无法弄清楚如何将DayOfWeek（以0到6的整数表示）作为模型中的附加功能添加。我从下面尝试了一些解决方案，但仍然无法到达那里：

# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.  

vectorizer = CountVectorizer(analyzer = "word",
                         tokenizer = None,
                         preprocessor = None,
                         stop_words = None,
                         max_features = 5000) 

# The column I want to add as additional feature:
train_day = train["DayOfWeek"]

# The current feature
train_data_features = vectorizer.fit_transform(clean_train_reviews)
print(train_data_features)

# Convert the result to an array
train_data_features = train_data_features.toarray()

# Initialize a Random Forest classifier with 100 trees
forest = RandomForestClassifier(n_estimators = 100) 

# Fit the forest to the training set, using the bag of words as 
# features and the customers as the response variable

forest = forest.fit(train_data_features, train["customers"] )

我曾尝试使用hstack加入数组但看起来似乎不是正确的方法。

任何帮助将不胜感激！

向随机森林分类器添加功能--- SciKit / Python

0 个答案: