训练模型时出现InvalidArgumentError

时间:2019-06-01 17:15:14

标签: python tensorflow machine-learning recurrent-neural-network gradient-descent

我正在研究应该产生笔迹的RNN,并且我已经尝试训练模型数周了。该模型确实有效,但结果却中等,因为训练总是在很早的时候停止。

培训始终会引发以下异常:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm.

我的合作伙伴和我的第一个猜测是,我们手上有一个爆炸性的梯度问题。所以到目前为止,我们已经尝试过:

  • 减小批量大小
  • 从Adam Optimizer切换到RMSprop
  • 减少数据规模
  • 极大地降低学习率(实际上完成了培训)
  • 减少和增加渐变裁剪

这些方法中只有一种可以防止错误发生,但缺点是模型的法向损失仍然很高,并且表现不佳。

错误在这里发生

grads = tf.gradients(self.cost, tvars)

grads, _ = tf.clip_by_global_norm(grads, params.grad_clip) #tf.gradients(ys, xs) calculates the gradient of ys w.r.t 

self.train_op = self.optimizer.apply_gradients(zip(grads, tvars)) # training operation (learning) 

这些是我们的参数:

self.batch_size=32 if self.train else 1 #only for training
self.tsteps=200 if self.train else 1 #only for backprob in LSTM cell(time steps to complete the sentence)
self.data_scale = 100 #amount to scale data down before training
self.limit = 500
self.tsteps_per_ascii=25 #estimation for one char at gaussian conv./char window
self.alphabet=" abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
self.data_dir="./data/"
self.len_threshold=1
#mode
self.rnn_size = 100
self.dropout = 0.85 #probability of keeping neuron during dropout; architecture learns more general
self.kmixtures = 1 #number of gaussian mixtures for character window
self.nmixtures = 8 #number of gaussian mixtures 
self.learning_rate = 0.00001 #learing rat
self.grad_clip = 10. # clip gradients to this magnitude (avoid exploding gradiend
self.optimizer = 'rms' # adam or rm
self.lr_decay = 1.0
self.decay = 0.95
self.momentum = 0.9

0 个答案:

没有答案