我是RFC
(随机森林分类器)的新手,我尝试完成一个相当简单的例子。
我有一个3列的训练数据集,如下所示:
Utm_term | DayOfWeek | Customers
我已经设置了RandomForest
模型,使用词袋方法训练Query
数据(以文本形式显示)。
但是,我无法弄清楚如何将DayOfWeek
(以0到6的整数表示)作为模型中的附加功能添加。我从下面尝试了一些解决方案,但仍然无法到达那里:
# Initialize the "CountVectorizer" object, which is scikit-learn's
# bag of words tool.
vectorizer = CountVectorizer(analyzer = "word",
tokenizer = None,
preprocessor = None,
stop_words = None,
max_features = 5000)
# The column I want to add as additional feature:
train_day = train["DayOfWeek"]
# The current feature
train_data_features = vectorizer.fit_transform(clean_train_reviews)
print(train_data_features)
# Convert the result to an array
train_data_features = train_data_features.toarray()
# Initialize a Random Forest classifier with 100 trees
forest = RandomForestClassifier(n_estimators = 100)
# Fit the forest to the training set, using the bag of words as
# features and the customers as the response variable
forest = forest.fit(train_data_features, train["customers"] )
我曾尝试使用hstack
加入数组但看起来似乎不是正确的方法。
任何帮助将不胜感激!