Question

我有一个简单的LSTM网络，看起来像这样：

lstm_activation = tf.nn.relu

cells_fw = [LSTMCell(num_units=100, activation=lstm_activation), 
            LSTMCell(num_units=10, activation=lstm_activation)]

stacked_cells_fw = MultiRNNCell(cells_fw)

_, states = tf.nn.dynamic_rnn(cell=stacked_cells_fw,
                              inputs=embedding_layer,
                              sequence_length=features['length'],
                              dtype=tf.float32)

output_states = [s.h for s in states]
states = tf.concat(output_states, 1)

我的问题是。当我不使用激活（activation = None）或不使用tanh时，一切正常，但是当我切换relu时，我总是收到“训练过程中NaN丢失”的信息，为什么？它是100％可复制的。

Answer 1

在# Convert month to monthnumbers df['timestamp'] = pd.to_datetime(df.timestamp).dt.month df = df.groupby('timestamp')['error'].mean().sort_index().reset_index() print(df) timestamp error 0 1 1.750189 1 12 1.800312内使用relu activation function时，可以确保单元格的所有输出以及单元格状态都严格地为lstm cell。因此，您的渐变会变得非常大并爆炸。例如，运行以下代码片段，并观察到输出永远不会为>= 0。

< 0

在LSTM中添加relu激活后，为什么会得到Nan？

1 个答案: