Question

我想使用mmap在Linux下运行的C程序中实现程序状态的某些部分的持久性，方法是使用设置了MAP_SHARED标志的mmap（）将固定大小的结构与众所周知的文件名相关联。出于性能原因，我宁愿根本不调用msync（），也不会有其他程序访问此文件。当我的程序终止并重新启动时，它将再次映射同一个文件并对其进行一些处理以恢复它在终止之前所处的状态。我的问题是：如果我从不在文件描述符上调用msync（），内核是否会保证对内存的所有更新都会写入磁盘并随后可恢复，即使我的进程是以SIGKILL终止的？此外，即使我的程序从不调用msync（），内核是否会有周期性的系统开销定期将页面写入磁盘？

编辑：我已经解决了数据是否被写入的问题，但是我仍然不确定这是否会导致一些意外的系统加载而不是试图用open来处理这个问题（）/ write（）/ fsync（）并承担如果进程被KILL / SEGV / ABRT /等命中，某些数据可能会丢失的风险。添加了一个'linux-kernel'标签，希望有些知识渊博的人可以加入。

Answer 1

我发现Linus Torvalds的评论回答了这个问题 http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068

映射页面是文件系统缓存的一部分，这意味着即使对该页面进行了更改的用户进程终止，该页面仍然由内核管理，并且所有对该文件的并发访问都将通过内核，其他进程将从该缓存中获得服务。在一些旧的Linux内核中它是不同的，这就是为什么一些内核文档仍然强制msync。

编辑：感谢RobH更正了链接。

Answer 2

我决定不那么懒，并回答了是否通过编写代码将数据写入磁盘的问题。答案是它将被写入。

这是一个程序，在将一些数据写入mmap文件后突然终止：

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  if (ftruncate(fd, data_length) < 0) {
    perror("Unable to truncate file 'test.mm'");
    exit(1);
  }
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  memset(data, 0, data_length);
  for (data->count = 0; data->count < 5; ++data->count) {
    data->data[data->count] = test_data[data->count];
  }
  kill(getpid(), 9);
}

这是一个程序，用于在上一个程序死后验证生成的文件：

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDONLY);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  assert(5 == data->count);
  unsigned index;
  for (index = 0; index < 4; ++index) {
    assert(test_data[index] == data->data[index]);
  }
  printf("Validated\n");
}

Answer 3

我发现有些东西增加了我的困惑：

munmap不会影响映射的对象，即对munmap的调用不会导致写入映射区域的内容到磁盘文件。更新MAP_SHARED的磁盘文件区域由内核的虚拟内存算法自动发生当我们存储到内存映射区域时。

摘自UNIX®环境中的高级编程。

来自linux联机帮助页的

：

MAP_SHARED与映射此映射的所有其他进程共享此映射宾语。存放到该地区相当于写入文件。 在msync（2）或之前，文件实际上可能不会更新 munmap（2）被称为。

这两个似乎是矛盾的。 APUE错了吗？

Answer 4

我没有找到你问题的非常精确的答案，所以决定再添加一个：

首先关于丢失数据，使用write或mmap / memcpy机制写入页面缓存，并根据其页面替换设置/算法在OS中同步到底层存储。例如，linux有vm.dirty_writeback_centisecs，它确定将哪些页面视为“旧”以刷新到磁盘。现在，即使您的进程在写入调用成功后死亡，数据也不会丢失，因为数据已存在于最终将写入存储的内核页面中。丢失数据的唯一情况是操作系统本身崩溃（内核崩溃，断电等）。绝对确保您的数据已达到存储的方法是调用fsync或msync（对于mmapped区域），视情况而定。
关于系统负载问题，是的，为每个请求调用msync / fsync会大大降低吞吐量，所以只有在必要时才这样做。记住，你真的可以防止丢失操作系统崩溃的数据，我认为这种情况很少见，而且可能是最常见的。一般的优化是定期发出同步，例如1秒，以获得良好的平衡。

Answer 5

Linux联机帮助页信息不正确或Linux非常不符合要求。 msync不应该与更改是否提交到文件的逻辑状态有关，或者使用mmap或read访问该文件的其他进程是否看到更改;它纯粹是fsync的模拟，除非出于电源故障或其他硬件级故障时确保数据完整性的目的，否则应视为无操作。

Answer 6

根据联机帮助页，

该文件实际上可能不是更新，直到msync（2）或munmap（）被调用。

因此，您需要确保在退出前至少致电munmap()。

mmap，msync和linux进程终止

6 个答案: