Tensorflow单元测试崩溃或花费很长时间

时间:2019-01-16 14:41:02

标签: azure unit-testing tensorflow

我正在根据contributing guidelines运行TensorFlow的单元测试。在我的本地macOS计算机上,由于an issue with apt-key,Docker镜像无法编译。我在Azure上设置了Docker on Ubuntu Server虚拟机(Standard A1 (1 Core, 1.75 GiB memory)),并运行了以下命令:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
tensorflow/tools/ci_build/ci_build.sh CPU bazel test //tensorflow/...

我的第一次尝试在最后的这些行崩溃了:

Analyzing: 7629 targets (561 packages loaded, 37218 targets configured)
java.lang.OutOfMemoryError: GC overhead limit exceeded
Dumping heap to /home/mmorin/tensorflow/bazel-ci_build-cache/.cache/bazel/_bazel_mmorin/eab0d61a99b6696edb3d2aff87b585e8/java_pid25131.hprof ...
Heap dump file created [617886369 bytes in 9.053 secs]
Internal error thrown during build. Printing stack trace: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:115)
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:142)
    at com.google.devtools.build.lib.analysis.RuleConfiguredTargetBuilder.build(RuleConfiguredTargetBuilder.java:174)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:53)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:36)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createRule(ConfiguredTargetFactory.java:323)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createConfiguredTarget(ConfiguredTargetFactory.java:207)
    at com.google.devtools.build.lib.skyframe.SkyframeBuildView.createConfiguredTarget(SkyframeBuildView.java:636)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.createConfiguredTarget(ConfiguredTargetFunction.java:783)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:326)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.localPopAndExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)

INFO: Elapsed time: 873.635s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (561 packages loaded, 37254 targets configured)
Internal error thrown during build. Printing stack trace: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:115)
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:142)
    at com.google.devtools.build.lib.analysis.RuleConfiguredTargetBuilder.build(RuleConfiguredTargetBuilder.java:174)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:53)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:36)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createRule(ConfiguredTargetFactory.java:323)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createConfiguredTarget(ConfiguredTargetFactory.java:207)
    at com.google.devtools.build.lib.skyframe.SkyframeBuildView.createConfiguredTarget(SkyframeBuildView.java:636)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.createConfiguredTarget(ConfiguredTargetFunction.java:783)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:326)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.localPopAndExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:115)
    at com.google.devtools.build.lib.analysis.configuredtargets.RuleConfiguredTarget.<init>(RuleConfiguredTarget.java:142)
    at com.google.devtools.build.lib.analysis.RuleConfiguredTargetBuilder.build(RuleConfiguredTargetBuilder.java:174)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:53)
    at com.google.devtools.build.lib.rules.python.PyBinary.create(PyBinary.java:36)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createRule(ConfiguredTargetFactory.java:323)
    at com.google.devtools.build.lib.analysis.ConfiguredTargetFactory.createConfiguredTarget(ConfiguredTargetFactory.java:207)
    at com.google.devtools.build.lib.skyframe.SkyframeBuildView.createConfiguredTarget(SkyframeBuildView.java:636)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.createConfiguredTarget(ConfiguredTargetFunction.java:783)
    at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:326)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.localPopAndExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
GC overhead limit exceeded

ERROR: bazel ran out of memory and crashed.
FAILED: Build did NOT complete successfully (561 packages loaded, 37254 targets configured)

我的第二次尝试达到了配置的更多目标,但运行一个小时后似乎停滞了,因为它现在正在逐一增加目标:

Analyzing: 7629 targets (561 packages loaded, 35150 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35279 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35326 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35340 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35345 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35346 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35347 targets configured)
Analyzing: 7629 targets (561 packages loaded, 35347 targets configured)

TensorFlow单元测试需要运行多长时间?是否有人成功在Azure上运行它们?如果是,您使用了哪个映像和计算机?

0 个答案:

没有答案