Question

我正在写一个脚本，有时会泄漏张量。这可能在多种情况下发生，例如，当我训练神经网络时，但是训练失败了。在这种情况下，训练会中断，并且不会正确处理张量。这会导致内存泄漏，我正试图通过处理未使用的张量来进行清理。

示例

在下面的代码段中，我正在训练两个（非常简单的）模型。第一次运行将起作用，并且不会导致张量泄漏（训练之前的张量数量=训练之后的张量数量）。第二次，我使用无效的reshape层在训练过程中强制崩溃。因此，将引发错误，并且无法正确处理数据集中的张量（我猜？）。该代码是显示张量可能如何泄漏的示例。

async function train(shouldCrash) {
  console.log(`Training, shouldCrash=${shouldCrash}`);
  const dataset = tf.data.zip({ // setup data
    xs: tf.data.array([[1],[1]]),
    ys: tf.data.array([1]),
  }).batch(1);

  const model = tf.sequential({ // setup model
    layers: [
      tf.layers.dense({units: 1, inputShape: [1]}),
      tf.layers.reshape({targetShape: [(shouldCrash ? 2 : 1)]}), // use invalid shape when crashing
    ],
  });
  model.compile({ optimizer: 'sgd', loss: 'meanSquaredError' });
  console.log('  Tensors before:', tf.memory().numTensors);
  try {
    const history = await model.fitDataset(dataset, { epochs: 1 });
  } catch (err) {
    console.log(`    Error: ${err.message}`);
  }
  console.log('  Tensors after:', tf.memory().numTensors);
}

(async () => {
  await train(false); // normal training
  await train(true); // training with error
})();

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2/dist/tf.min.js"></script>

问题

有一个tf.tidy，它在某些情况下可以帮助我处理未使用的张量，但是它只能用于同步函数调用。因此，在调用await model.fitDataset(...)时不能使用它。

有没有办法处置任何未使用的张量？或者，是否可以在页面上放置所有现有的张量（而无需重新加载）？

Answer 1

清除异步代码中所有未使用的张量的方法是将创建它们的代码包装在startScope（）和endScope（）调用之间。

tf.engine().startScope()
// do your thing
tf.engine().endScope()

Answer 2

根据文档，提供给tf.tidy的功能“不得返回承诺”。在内部，tf后端会在拟合模型时配置所有张量。这就是为什么不应将tf.fit放在tf.tidy内的原因。要处理崩溃的模型，可以在模型上调用tf.dispose。

确实，当前似乎存在内存泄漏，但是在模型定义期间发生模型崩溃是一种较差的实现。在适当的情况下不应发生这种情况，因为可以测试给定的参数是否与应作为图层输入的项匹配。例如，在构建模型以防止内存泄漏之前，可以避免将形状更改为2到1的形状。

async function train(shouldCrash) {
  console.log(`Training, shouldCrash=${shouldCrash}`);
  const dataset = tf.data.zip({ // setup data
    xs: tf.data.array([[1],[1]]),
    ys: tf.data.array([1]),
  }).batch(1);

  const model = tf.sequential({ // setup model
    layers: [
      tf.layers.dense({units: 1, inputShape: [1]}),
      tf.layers.reshape({targetShape: [(shouldCrash ? 2 : 1)]}), // use invalid shape when crashing
    ],
  });
  model.compile({ optimizer: 'sgd', loss: 'meanSquaredError' });
  console.log('  Tensors before:', tf.memory().numTensors);
  try {
    const history = await model.fitDataset(dataset, { epochs: 1 });
  } catch (err) {
    console.log(`    Error: ${err.message}`);
  }
  
  console.log('  Tensors after:', tf.memory().numTensors);
  return model
}

(async () => {
  const m1 = await train(false); // normal training
   tf.dispose(m1)
  const m2 = await train(true); // training with error
  
  tf.dispose(m2)
  tf.disposeVariables() 
  console.log('Tensors afters:', tf.memory().numTensors);
   
})();

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2/dist/tf.min.js"></script>

Tensorflow.js中的内存泄漏：如何清理未使用的张量？

2 个答案: