CUDA_ERROR_OUT_OF_MEMORY仅在评估阶段

时间:2019-05-08 01:14:16

标签: tensorflow google-cloud-platform google-cloud-ml

我正在使用张量流tf.train_and_evaluate和Google的cloud ai作业系统一起训练tf.estimator.Estimator模型。

最近,当我去训练模型时,我在训练时遇到CUDA_ERROR_OUT_OF_MEMORY错误,但是我注意到这仅发生在evaluation阶段。即我可以按照任意数量的步骤进行训练,但是一旦训练阶段结束,我就会看到错误。

我已在下面复制并粘贴了确切的错误(连续存在多个错误):

failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY:
out of memory
could not allocate pinned host memory of size: 8589934592    
failed to alloc 7730940928 bytes on host: CUDA_ERROR_OUT_OF_MEMORY:    
out of memory could not allocate pinned host memory of size:
7730940928 failed to alloc 6957846528 bytes on host:
CUDA_ERROR_OUT_OF_MEMORY: out of memory could not allocate pinned host
memory of size: 6957846528 failed to alloc 6262061568 bytes on host:
CUDA_ERROR_INVALID_VALUE: invalid argument could not allocate pinned
host memory of size: 6262061568 failed to alloc 5635855360 bytes on
host: CUDA_ERROR_INVALID_VALUE: invalid argument could not allocate
pinned host memory of size: 5635855360

0 个答案:

没有答案
相关问题