sklearn随机森林分类python中的内存分配错误

时间:2018-11-28 19:02:31

标签: python scikit-learn random-forest

我正在尝试在具有5个属性和1个类的2,79,900个实例上运行sklearn随机森林分类。但是我在尝试在合适的行上运行分类时遇到内存分配错误,它无法训练分类器本身。有关如何解决此问题的任何建议?

数据a是

x,y,日,周,准确性

x和y是坐标 day是每月的哪一天(1-30) 星期几是星期几(1-7) 和精度是整数

代码:

import csv
import numpy as np
from sklearn.ensemble import RandomForestClassifier


with open("time_data.csv", "rb") as infile:
    re1 = csv.reader(infile)
    result=[]
    ##next(reader, None)
    ##for row in reader:
    for row in re1:
        result.append(row[8])

    trainclass = result[:251900]
    testclass = result[251901:279953]


with open("time_data.csv", "rb") as infile:
    re = csv.reader(infile)
    coords = [(float(d[1]), float(d[2]), float(d[3]), float(d[4]), float(d[5])) for d in re if len(d) > 0]
    train = coords[:251900]
    test = coords[251901:279953]

print "Done splitting data into test and train data"

clf = RandomForestClassifier(n_estimators=500,max_features="log2", min_samples_split=3, min_samples_leaf=2)
clf.fit(train,trainclass)

print "Done training"
score = clf.score(test,testclass)
print "Done Testing"
print score

错误:

line 366, in fit
    builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
  File "sklearn/tree/_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
  File "sklearn/tree/_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
  File "sklearn/tree/_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
  File "sklearn/tree/_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 10206838784 bytes

3 个答案:

答案 0 :(得分:1)

请尝试使用Google合作实验室。您可以连接localhost或托管运行时。它对我有用n_estimators = 10000。

答案 1 :(得分:0)

从scikit-learn文档中:“控制树大小的参数的默认值(例如max_depth,min_samples_leaf等)导致完全生长和未修剪的树,这些树可能很大。一些数据集。为减少内存消耗,应通过设置这些参数值来控制树的复杂性和大小。”

然后我将尝试调整这些参数。另外,您可以尝试使用mem。分析器,或者如果计算机的RAM太少,则尝试在GoogleCollaborator上运行它。

答案 2 :(得分:0)

我最近遇到了同一个MemoryErr。但是我通过减少训练数据大小而不是修改模型参数来解决此问题。我的OOB值为0.98,这意味着该模型不太可能过拟合。