我在将tf.Data与多个输入的Keras结合使用时遇到麻烦。
我正在使用Python生成器从PostgreSQL表中读取数据,该生成器返回三个数组:
class PairGenerator(object):
return {'numerical_inputs': features, 'cat_input': idx}, response
我使用.from_generator
创建一个数据集对象:
training_generator = PairGenerator(sql_query = sql_query, config_file = 'config.json', column_dtypes = ColsDtypes, n_steps = n_steps, num_obs = 1000, batch_size = batch_size)
train_dataset = tf.data.Dataset.from_generator(lambda: training_generator, output_types=({'numerical_inputs': tf.float32, 'cat_input': tf.string}, tf.int32), output_shapes=({'numerical_inputs': tf.TensorShape([None, 10, 36]), 'cat_input': tf.TensorShape([None,10])}, tf.TensorShape([None,10, 1]))).prefetch(1)
#<DatasetV1Adapter shapes: ({numerical_inputs: (None, 10, 36), cat_input: (None, 10)}, (None, 10, 1)), types: ({numerical_inputs: tf.float32, cat_input: tf.string}, tf.int32)>
当我打印一些示例时,这很好用
for epoch in range(3):
for example_batch, label_batch in train_dataset:
print(len(example_batch))
print(label_batch.shape)
print("End of epoch: ", epoch)
所以我使用Tensorflow 2.0的Keras定义模型
batch_size = 32
num_obs = 1000
num_cats = 1 # number of categorical features
n_steps = 10 # number of timesteps in each sample
n_numerical_feats = 36 # number of numerical features in each sample
cat_size = 32465 # number of unique categories in each categorical feature
embedding_size = 1 # embedding dimension for each categorical feature
numerical_inputs = keras.layers.Input(shape=(n_steps, n_numerical_feats), name='numerical_inputs')
#<tf.Tensor 'numerical_inputs:0' shape=(?, 10, 36) dtype=float32>
cat_input = keras.layers.Input(shape=(n_steps,), name='cat_input')
#<tf.Tensor 'cat_input:0' shape=(None, 10) dtype=float32>
cat_embedded = keras.layers.Embedding(cat_size, embedding_size, embeddings_initializer='uniform')(cat_input)
#<tf.Tensor 'embedding_1/Identity:0' shape=(None, 10, 1) dtype=float32>
merged = keras.layers.concatenate([numerical_inputs, cat_embedded])
#<tf.Tensor 'concatenate_1/Identity:0' shape=(None, 10, 37) dtype=float32>
lstm_out = keras.layers.LSTM(64, return_sequences=True)(merged)
#<tf.Tensor 'lstm_2/Identity:0' shape=(None, 10, 64) dtype=float32>
Dense_layer1 = keras.layers.Dense(32, activation='relu', use_bias=True)(lstm_out)
#<tf.Tensor 'dense_4/Identity:0' shape=(None, 10, 32) dtype=float32>
Dense_layer2 = keras.layers.Dense(1, activation='linear', use_bias=True)(Dense_layer1 )
#<tf.Tensor 'dense_5/Identity:0' shape=(None, 10, 1) dtype=float32>
model = keras.models.Model(inputs=[numerical_inputs, cat_input], outputs=Dense_layer2)
然后我编译模型
#compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
EPOCHS =5
现在适合模型。从Tensorflow 1.9开始,可以将tf.data.Dataset
对象直接传递到keras.Model.fit()
中。
#fit the model
history = model.fit(train_dataset,
epochs=EPOCHS,
verbose=1,
initial_epoch=0)
但是,在此步骤中,什么都没有发生。 Jupyter Kernel已启动,似乎正在运行,但未显示任何结果!
如果我不使用tf.data.Dataset
对象,而是直接从numpy数组中馈送数据,它的工作原理就像是魅力!
#fit the model
#you can use input layer names instead
history = model.fit({'numerical_inputs': X_numeric,
'cat_input': X_cat1.reshape(-1, n_steps)},
y = target,
batch_size = batch_size
epochs=EPOCHS,
verbose=1,
initial_epoch=0)
github上没有解决方案! https://github.com/tensorflow/tensorflow/issues/20698
我真的不知道该怎么办。有人可以帮我吗?