向随机森林分类器添加功能--- SciKit / Python

时间:2017-07-24 16:19:58

标签: python machine-learning scikit-learn random-forest

我是RFC(随机森林分类器)的新手,我尝试完成一个相当简单的例子。

我有一个3列的训练数据集,如下所示:

Utm_term  | DayOfWeek  | Customers

我已经设置了RandomForest模型,使用词袋方法训练Query数据(以文本形式显示)。

但是,我无法弄清楚如何将DayOfWeek(以0到6的整数表示)作为模型中的附加功能添加。我从下面尝试了一些解决方案,但仍然无法到达那里:

# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.  

vectorizer = CountVectorizer(analyzer = "word",
                         tokenizer = None,
                         preprocessor = None,
                         stop_words = None,
                         max_features = 5000) 

# The column I want to add as additional feature:
train_day = train["DayOfWeek"]

# The current feature
train_data_features = vectorizer.fit_transform(clean_train_reviews)
print(train_data_features)

# Convert the result to an array
train_data_features = train_data_features.toarray()

# Initialize a Random Forest classifier with 100 trees
forest = RandomForestClassifier(n_estimators = 100) 

# Fit the forest to the training set, using the bag of words as 
# features and the customers as the response variable

forest = forest.fit(train_data_features, train["customers"] )

我曾尝试使用hstack加入数组但看起来似乎不是正确的方法。

任何帮助将不胜感激!

0 个答案:

没有答案