tensorflow - 经过几次迭代后，Tensorflow会耗尽内存

我正在使用GPU在大型数据集上运行一个deeplearning项目。每幅图像约为200 * 200 * 200体素。

在训练过程中，我在不同的迭代中收到了waring和OutofMemory错误，有时候我的程序会在Outof Memory错误引起的第一次迭代时结束，但有时它会因为同样的原因经过数百次迭代后结束。

所以我想知道它是否可以训练并且已经运行了一些迭代，为什么仍会出现Outof Memory错误？我没有在GPU中运行其他程序，批量化也是固定的。有人可以帮助我解决它或提供一些关于如何处理它的想法吗？

一些细节： Tensorflow将始终发出如下警告：

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 13278 get requests, put_count=13270 evicted_count=1000 eviction_rate=0.075358 and unsatisfied allocation rate=0.0834463 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110

并且在迭代之后，pragram将被内存错误（输出的一部分）停止：

... I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288a7e00 of size 17408 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288ac200 of size 17408 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288b0600 of size 6912 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288b2100 of size 6912 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288b3c00 of size 6912 I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x33288b5700 of size 6912 ... ... W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 648.00MiB. See logs for memory state.

我使用的一些操作：tf.nn.conv3d / tf.nn.conv3d_transpose / tf.nn.batch_normalization

经过几次迭代后，Tensorflow会耗尽内存

0 个答案: