以下代码:
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
import pandas as pd
# DATA PREPARE
df = pd.read_csv('housing.csv')
df = df.dropna()
print(df.head)
print(df.describe())
print(df.info())
# NORMALIZATION
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(df[['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income',
'median_house_value']])
df_scaled_cols = scaler.transform(df[['housing_median_age', 'total_rooms', 'total_bedrooms',
'population', 'households', 'median_income', 'median_house_value']])
df_scaled_cols = pd.DataFrame(data=df_scaled_cols, columns=['housing_median_age', 'total_rooms', 'total_bedrooms',
'population', 'households', 'median_income',
'median_house_value'])
df = pd.concat([df_scaled_cols, df['ocean_proximity']], axis=1)
# DATAFRAME INTO X AND Y -> TRAIN TEST SPLIT
x_data = df[['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income',
'ocean_proximity']]
y_label = df['median_house_value']
X_train, X_test, y_train, y_test = train_test_split(x_data, y_label, test_size=0.3)
# FEATURE COLUMNS FROM DATA
m_age = tf.feature_column.numeric_column('housing_median_age')
rooms = tf.feature_column.numeric_column('total_rooms')
bedrooms = tf.feature_column.numeric_column('total_bedrooms')
population = tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
income = tf.feature_column.numeric_column('median_income')
ocean = tf.feature_column.categorical_column_with_hash_bucket('ocean_proximity', hash_bucket_size=10)
embedded_ocean = tf.feature_column.embedding_column(ocean, dimension=4)
feat_cols = [m_age, rooms, bedrooms, population, households, income, embedded_ocean]
# 3 INPUT FUNCTIONS
train_input_func = tf.estimator.inputs.pandas_input_fn(x=X_train, y=y_train, batch_size=10, num_epochs=1000,
shuffle=True)
test_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test, y=y_test, batch_size=10, num_epochs=1, shuffle=False)
predict_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test, batch_size=10, num_epochs=1, shuffle=False)
# DNN_Reg MODEL
dnn_model = tf.estimator.DNNRegressor(hidden_units=[10,10,10], feature_columns=feat_cols)
dnn_model.train(input_fn=train_input_func, steps=1000)
导致错误:
回溯(最近通话最近):文件 “ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 1278行,_do_call 返回fn(* args)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, _run_fn中的第1263行 选项,feed_dict,fetch_list,target_list,run_metadata)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 第1350行,在_call_tf_sessionrun中 run_metadata)tensorflow.python.framework.errors_impl.InternalError:无法获取 元素以字节为单位。
在处理上述异常期间,发生了另一个异常:
回溯(最近通话最近):文件 “ C:/用户/管理员/文档/PycharmProjects/TF_Regression_Project/project.py”, 第69行,在 dnn_model.train(input_fn = train_input_func,步骤= 1000)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ estimator \ estimator.py”, 376号线,在火车上 损失= self._train_model(input_fn,hook,saving_listeners)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ estimator \ estimator.py”, 第1145行,在_train_model中 返回self._train_model_default(input_fn,hooks,saving_listeners)文件 “ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ estimator \ estimator.py”, 第1173行,在_train_model_default中 Saving_listeners)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ estimator \ estimator.py”, 第1451行,在_train_with_estimator_spec中 _,损失= mon_sess.run([estimator_spec.train_op,estimator_spec.loss])文件 “ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ training \ monitored_session.py”, 第695行,退出 self._close_internal(exception_type)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ training \ monitored_session.py”, _close_internal中的第732行 self._sess.close()文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ training \ monitored_session.py”, 980号线,关闭 self._sess.close()文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ training \ monitored_session.py”, 1124行,处于关闭状态 ignore_live_threads = True)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ training \ coordinator.py”, 389行,加入 six.reraise(* self._exc_info_to_raise)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ six.py”, 复线692 提高价值。with_traceback(tb)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ estimator \ inputs \ queues \ feeding_queue_runner.py”, _run中的第94行 sess.run(enqueue_op,feed_dict = feed_dict)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 877行,正在运行 run_metadata_ptr)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, _run中的第1100行 feed_dict_tensor,选项,run_metadata)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, _do_run中的第1272行 run_metadata)文件“ C:\ Users \ Admin \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, _do_call中的第1291行 提高类型(e)(node_def,op,message)tensorflow.python.framework.errors_impl.InternalError:无法获取 元素以字节为单位。
里面有什么问题?
答案 0 :(得分:1)
问题是归一化。
我执行了以下操作,而不是sklearn方法:
df[['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income',
'median_house_value']] = df[['housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income',
'median_house_value']].apply(lambda x: (x-x.min()) / (x.max()-x.min()))
因此,总而言之,我做的事情与sklearn相同,但手动操作-使用lambda。