尝试分配内存映射的INDArray

时间:2019-07-14 21:23:34

标签: linux mmap dl4j nd4j

我正在尝试分配一个大于RAM的大型内存映射2d数组,并且该数组始终因内存不足错误而失败。我正在使用java8,linux-amd64和nd4j 1.0.0-beta4。根据文档(https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory),我的理解是,我应该能够分配一个比RAM大得多的数组,因为它将使用一个临时文件,然后依靠操作系统按需将其分页(例如使用mmap)

更新-重新启动后我获得了一些间歇性的成功-我想知道大型阵列分配例程的日常维护是否需要一些基本RAM?也许调零?我会回报...

我尝试了一些不同的策略选项,确保临时文件所在的位置有足够的可用磁盘,并进行了少量调试,以查看内存分配代码的内容,但无济于事。抱怨物理内存不足似乎总是失败的-这是正确的,没有足够的RAM可以做到这一点,这就是重点

long cols = 3000;
long rows = 1000000;

long expectedSize = 4 * cols * rows;
this.nd4jWorkspaceManager = Nd4j.getWorkspaceManager();
this.workspaceConfig  = WorkspaceConfiguration.builder()
    .initialSize(expectedSize)
    .policyLocation(LocationPolicy.MMAP)
    .policyAllocation(AllocationPolicy.OVERALLOCATE)
    .policySpill(SpillPolicy.EXTERNAL)
    .tempFilePath(System.getProperty("user.home") + "/.nd4jtmp")
    .build();

System.out.format("Attempting to create workspace of size %s%n", formatBytes(expectedSize));
this.memoryWorkspace = Nd4j.getWorkspaceManager().getAndActivateWorkspace(workspaceConfig, "M2");
System.out.println("... Done");

System.out.format("Attempting to create array of size %s%n", formatBytes(expectedSize));
INDArray matrix = Nd4j.create(DataType.FLOAT, rows, cols);
System.out.println("... Done");

System.out.format("Populating array with random numbers...%n");

for (int i = 0; i < rows; i++) {
  for (int j = 0; j < cols; j++) {
    matrix.put(i, j, (float) Math.random());
  } 
}

System.out.println("... Done");

这是free的输出,

$ free
              total        used        free      shared  buff/cache   available
Mem:        7852420     2950656      120860      311200     4780904     4067768
Swap:       7811068      190632     7620436

我运行main方法,但是它无法使用以下方法分配数组:

09:14:36,476 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
09:14:36,477 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/test-shared/bin/main/logback.xml]
09:14:36,478 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/home/nickg/src/riskscape/riskscape/cli/bin/main/logback.xml]
09:14:36,786 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
09:14:36,790 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
09:14:36,797 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDERR]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [nz.org.riskscape] to WARN
09:14:36,909 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[nz.org.riskscape]
09:14:36,909 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ERROR
09:14:36,910 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDERR] to Logger[ROOT]
09:14:36,910 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
09:14:36,912 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6950e31 - Registering current configuration as safe fallback point

Attempting to create workspace of size 11.18gb
... Done
Attempting to create array of size 11.18gb
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new LongPointer(8): totalBytes = 1, physicalBytes = 3779M
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:76)
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:41)
    at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:407)
    at org.nd4j.linalg.api.buffer.LongBuffer.<init>(LongBuffer.java:81)
    at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:478)
    at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createLong(DefaultDataBufferFactory.java:473)
    at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1449)
    at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3241)
    at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:76)
    at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:65)
    at org.nd4j.linalg.cpu.nativecpu.DirectShapeInfoProvider.createShapeInformation(DirectShapeInfoProvider.java:49)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:232)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:343)
    at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:185)
    at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:189)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4651)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4129)
    at NDArrayAllocationTest.run(NDArrayAllocationTest.java:40)
    at NDArrayAllocationTest.main(NDArrayAllocationTest.java:14)
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (3779M) > maxPhysicalBytes (3410M)
    at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:585)
    at org.bytedeco.javacpp.Pointer.init(Pointer.java:125)
    at org.bytedeco.javacpp.LongPointer.allocateArray(Native Method)
    at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:68)
    ... 18 more

1 个答案:

答案 0 :(得分:0)

看来这只是个家务事。我确认mmaped文件用于我的NDArray,为数组的形状分配一些缓冲区时发生了OOM。将org.bytedeco.javacpp.maxphysicalbytes设置为足够大后,NDArray将成功构建。

我不确定这为什么行得通,为什么有必要,但是我们就走了。它无法分配的长缓冲区的长度只有大约8个长...-也许mmap'd文件使进程报告的内存大小歪斜了吗?

如果有人对ND4J的内存管理有更多了解并可以发表评论,请发表评论。