Question

我正在尝试在数值和连续数据上训练基于张量流的随机森林回归。

当我尝试拟合我的估算器时，它会从下面的消息开始：

INFO：tensorflow：用params =
构建森林
INFO：tensorflow：{＆＃39; num_trees＆＃39;：10，＆＃39; max_nodes＆＃39;：1000，＆＃39; bagging_fraction＆＃39;：1.0，＆＃39; feature_bagging_fraction＆＃39;： 1.0，＆＃39; num_splits_to_consider＆＃39;：10，＆＃39; max_fertile_nodes＆＃39;：0，＆＃39; split_after_samples＆＃39;：250，＆＃39; valid_leaf_threshold＆＃39;：1，＆＃39; dominate_method＆＃39 ;:＆＃39; bootstrap＆＃39;，＆＃39; dominate_fraction＆＃39;：0.99，＆＃39; model_name＆＃39;：＆＃39; all_dense＆＃39;，＆＃39; split_finish_name＆＃39 ;：＆＃39;基本＆＃39;，＆＃39; split_pruning_name＆＃39;：＆＃39;无＆＃39;，＆＃39; collate_examples＆＃39;：False，＆＃39; checkpoint_stats＆＃39;：False ，＆＃39; use_running_stats_method＆＃39;：False，＆＃39; initialize_average_splits＆＃39;：False，＆＃39; inference_tree_paths＆＃39;：False，＆＃39; param_file＆＃39;：无，＆＃39; split_name＆＃39;：＆＃39; less_or_equal＆＃39;，＆＃39; early_finish_check_every_samples＆＃39;：0，＆＃39; prune_every_samples＆＃39;：0，＆＃39; feature_columns＆＃39;：[_ NumericColumn（key =＆＃39; Average_Score＆＃39;，shape =（1，），default_value = None，dtype = tf.float32，normalizer_fn = None）， _NumericColumn（key =＆＃39; lat＆＃39;，shape =（1，），default_value = None，dtype = tf.float32，normalizer_fn = None），_ NomericColumn（key =＆＃39; lng＆＃39;，shape = （1，），default_value = None，dtype = tf.float32，normalizer_fn = None）]，＆＃39; num_classes＆＃39;：1，＆＃39; num_features＆＃39;：2，＆＃39; regression＆＃39 ;：是的，＆＃39; bagged_num_features＆＃39;：2，＆＃39; bagged_features＆＃39;：无，＆＃39; num_outputs＆＃39;：1，＆＃39; num_output_columns＆＃39;：2，＆＃ 39; base_random_seed＆＃39;：0，＆＃39; leaf_model_type＆＃39;：2，＆＃39; stats_model_type＆＃39;：2，＆＃39; finish_type＆＃39;：0，＆＃39; pruning_type＆＃39; ：0，＆＃39; split_type＆＃39;：0}

然后该过程发生故障，我得到一个值错误：

ValueError：Shape必须至少为2级，但对于＆＃39; concat＆＃39; （op：＆＃39; ConcatV2＆＃39;）具有输入形状：[？]，[？]，[？]，[]和计算输入张量：input [3] =＆lt; 1＆gt;。

这是我正在使用的代码：

import tensorflow as tf
from tensorflow.contrib.tensor_forest.python import tensor_forest
from tensorflow.python.ops import resources
import pandas as pd
from tensorflow.contrib.tensor_forest.client import random_forest
from tensorflow.python.estimator.inputs import numpy_io
import numpy as np

def getFeatures():
    Average_Score = tf.feature_column.numeric_column('Average_Score')
    lat = tf.feature_column.numeric_column('lat')
    lng = tf.feature_column.numeric_column('lng')
    return [Average_Score,lat ,lng]

# Import hotel data
Hotel_Reviews=pd.read_csv("./DataMining/Hotel_Reviews.csv")

Hotel_Reviews_Filtered=Hotel_Reviews[(Hotel_Reviews.lat.notnull() | 
    Hotel_Reviews.lng.notnull())]

Hotel_Reviews_Filtered_Target = Hotel_Reviews_Filtered[["Reviewer_Score"]]
Hotel_Reviews_Filtered_Features = Hotel_Reviews_Filtered[["Average_Score","lat","lng"]]

#Preprocess the data
x=Hotel_Reviews_Filtered_Features.to_dict('list')
for key in x:
    x[key] = np.array(x[key])
y=Hotel_Reviews_Filtered_Target.values

#specify params
params = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(
  feature_colums= getFeatures(), 
  num_classes=1, 
  num_features=2, 
  regression=True, 
  num_trees=10, 
  max_nodes=1000)

#build the graph
graph_builder_class = tensor_forest.RandomForestGraphs

est=random_forest.TensorForestEstimator(
  params, graph_builder_class=graph_builder_class)

#define input function
train_input_fn = numpy_io.numpy_input_fn(
  x=x,
  y=y,
  batch_size=1000,
  num_epochs=1,
  shuffle=True)

est.fit(input_fn=train_input_fn, steps=500)

变量x是形状为（512470，）的numpy数组列表：

{'Average_Score': array([ 7.7,  7.7,  7.7, ...,  8.1,  8.1,  8.1]),
 'lat': array([ 52.3605759,  52.3605759,  52.3605759, ...,  48.2037451,
     48.2037451,  48.2037451]),
 'lng': array([  4.9159683,   4.9159683,   4.9159683, ...,  16.3356767,
     16.3356767,  16.3356767])}

变量y是形状（512470,1）的numpy数组：

array([[ 2.9],
   [ 7.5],
   [ 7.1],
   ..., 
   [ 2.5],
   [ 8.8],
   [ 8.3]])

Answer 1

使用ndmin = 2将x中的每个数组强制为2暗。然后形状应该匹配并且concat应该能够操作。

在Tensorflow上训练随机森林

1 个答案: