Question

There are two sets of very similar code below with a very simple input as an illustrative example to my question. I think an explanation to the following observation can somehow answer my question. Thanks!

When I run the following code, the model can be trained quickly and can predict good results.

import tensorflow as tf
import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')

model.fit(xs, ys, epochs=1000)
print(model.predict([7.0]))

However, when i run the following code, which is very similar to the one above, the model is trained very slowly and may not be well trained and give bad predictions (i.e. the loss becomes <1 easily with the code above but stays at around 20000 with the code below)


model = keras.Sequential()# Your Code Here#
model.add(keras.layers.Dense(2,activation = 'relu',input_shape = (1,)))
model.add(keras.layers.Dense(1))
#model.compile(optimizer=tf.train.AdamOptimizer(0.1),
              #loss='mean_squared_error')

model.compile(optimizer = tf.train.AdamOptimizer(1),loss = 'mean_squared_error')

#model.compile(# Your Code Here#)

xs = np.array([1,2,3,4,5,6,7,8,9,10], dtype=float)# Your Code Here#
ys = np.array([100,150,200,250,300,350,400,450,500,550], dtype=float)# Your Code Here#
model.fit(xs,ys,epochs = 1000)
print(model.predict([7.0]))

One more note: when I train my model with the second set of code, the model may be well trained occasionally (~8 out of 10 times it is not well trained, and loss remains >10000 after 1000 epochs).

Answer 1

I don't think there is any direct way to choose best deep architecture rather doing multiple experiments by varying hyper-parameters and changing the architecture. Compare the performance of each and every experiment and choose the best one. There are few articles listed below which may be helpful for you. link-1, link-2, link-3

How do I know if my tensorflow structure is good for my problem?

1 个答案: