为什么COW mmap在ENOMEM大于4GB的(稀疏)文件上失败?

时间:2010-08-31 22:19:00

标签: linux-kernel mmap copy-on-write

当尝试使用写时复制语义(PROT_READ | PROT_WRITE和MAP_PRIVATE)映射5GB文件时,会在2.6.26-2-amd64 Linux内核上发生这种情况。映射小于4GB的文件或仅使用PROT_READ工作正常。这不是this question中报告的软资源限制问题;虚拟限制大小无限制。

以下是重现问题的代码(实际代码是Boost.Interprocess的一部分)。

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>
#include <unistd.h>

main()
{
        struct stat b;
        void *base;
        int fd = open("foo.bin", O_RDWR);

        fstat(fd, &b);
        base = mmap(0, b.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        if (base == MAP_FAILED) {
                perror("mmap");
                return 1;
        }
        return 0;
}

以下是发生的事情:

dd if=/dev/zero of=foo.bin bs=1M seek=5000 count=1
./test-mmap
mmap: Cannot allocate memory

这是相关的strace(新编译的4.5.20)输出,如nos。

所述
open("foo.bin", O_RDWR)                 = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5243928576, ...}) = 0
mmap(NULL, 5243928576, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = -1 ENOMEM (Cannot allocate memory)
dup(2)                                  = 4
[...]
write(4, "mmap: Cannot allocate memory\n", 29mmap: Cannot allocate memory
) = 29

2 个答案:

答案 0 :(得分:5)

尝试在MAP_NORESERVE字段中传递flags,如下所示:

mmap(NULL, b.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, fd, 0);

您的交换和物理内存的组合可能小于请求的5GB。

或者,您可以执行此操作以进行测试,如果可行,您可以在上面更改代码:

# echo 0 > /proc/sys/vm/overcommit_memory

以下是手册页的相关摘录。

MMAP(2):

   MAP_NORESERVE
          Do  not reserve swap space for this mapping.  When swap space is
          reserved, one has the guarantee that it is  possible  to  modify
          the  mapping.   When  swap  space  is not reserved one might get
          SIGSEGV upon a write if no physical memory  is  available.   See
          also  the  discussion of the file /proc/sys/vm/overcommit_memory
          in proc(5).  In kernels before 2.6, this flag  only  had  effect
          for private writable mappings.

PROC(5):

   /proc/sys/vm/overcommit_memory
          This file contains the kernel virtual  memory  accounting  mode.
          Values are:

                 0: heuristic overcommit (this is the default)
                 1: always overcommit, never check
                 2: always check, never overcommit

          In  mode 0, calls of mmap(2) with MAP_NORESERVE are not checked,
          and the default check is very weak, leading to the risk of  get‐
          ting a process "OOM-killed".  Under Linux 2.4 any non-zero value
          implies mode 1.  In mode 2  (available  since  Linux  2.6),  the
          total  virtual  address  space on the system is limited to (SS +
          RAM*(r/100)), where SS is the size of the swap space, and RAM is
          the  size  of  the physical memory, and r is the contents of the
          file /proc/sys/vm/overcommit_ratio.

答案 1 :(得分:2)

从评论中引用内存,交换大小和过度使用设置:

MemTotal: 4063428 kB SwapTotal: 514072 kB
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio 
50

overcommit_memory设置为0(“启发式过度使用”),您无法创建一个比当前可用内存和交换总量更大的专用可写映射 - 显然,因为您只有4.5GB的内存+ swap,永远不会是真的。

您的选择要么使用MAP_NORESERVE(如Matt Joiner建议的那样),如果您确定您永远不会弄脏(写入)映射中的更多页面而不是您的可用内存和交换;或者显着增加交换空间的大小。