Question

我正在测试两个进程之间的最快方式。我有两个进程，一个写入数据，一个接收数据。我的脚本显示从文件写入和读取比管道更快。怎么会发生这种情况？内存比磁盘快？

从文件中写入和读取：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
from mutiprocesscomunicate import gen_data

data_size = 128 * 1024  # KB


def send_data_task(file_name):
    with open(file_name, 'wb+') as fd:
        for i in range(data_size):
            fd.write(gen_data(1))
            fd.write('\n'.encode('ascii'))
            # end EOF
        fd.write('EOF'.encode('ascii'))
    print('send done.')


def get_data_task(file_name):
    offset = 0
    fd = open(file_name, 'r+')
    i = 0
    while True:
        data = fd.read(1024)
        offset += len(data)
        if 'EOF' in data:
            fd.truncate()
            break
        if not data:
            fd.close()
            fd = None
            fd = open(file_name, 'r+')
            fd.seek(offset)
            continue
    print("recv done.")


if __name__ == '__main__':
    import multiprocessing

    pipe_out = pipe_in = 'throught_file'
    p = multiprocessing.Process(target=send_data_task, args=(pipe_out,), kwargs=())
    p1 = multiprocessing.Process(target=get_data_task, args=(pipe_in,), kwargs=())

    p.daemon = True
    p1.daemon = True
    import time

    start_time = time.time()
    p1.start()
    import time

    time.sleep(0.5)
    p.start()
    p.join()
    p1.join()
    import os
    os.sync()
    print('through file', data_size / (time.time() - start_time), 'KB/s')
    open(pipe_in, 'w+').truncate()

使用管道

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import multiprocessing
from mutiprocesscomunicate import gen_data

data_size = 128 * 1024  # KB


def send_data_task(pipe_out):
    for i in range(data_size):
        pipe_out.send(gen_data(1))
    # end EOF
    pipe_out.send("")
    print('send done.')


def get_data_task(pipe_in):
    while True:
        data = pipe_in.recv()
        if not data:
            break
    print("recv done.")


if __name__ == '__main__':
    pipe_out, pipe_in = multiprocessing.Pipe()
    p = multiprocessing.Process(target=send_data_task, args=(pipe_out,), kwargs=())
    p1 = multiprocessing.Process(target=get_data_task, args=(pipe_in,), kwargs=())

    p.daemon = True
    p1.daemon = True
    import time

    start_time = time.time()
    p1.start()
    p.start()
    p.join()
    p1.join()
    print('through pipe', data_size / (time.time() - start_time), 'KB/s')

创建数据功能：

def gen_data(size):
    onekb = "a" * 1024
    return (onekb * size).encode('ascii')

结果：

通过文件110403.02025891568 KB / s

通过管道75354.71358973449 KB / s

我使用Mac OS与python3。

更新

如果数据只有1kb，则管道比文件快100。但如果日期大，就像128MB的结果一样。

Answer 1

管道的容量有限，以匹配生产者和消费者的速度（通过背压流量控制），而不是消耗无限量的内存。根据{{3}}，OS X的特定限制是16KiB。当你写128KiB时，这意味着至少是系统调用（和上下文切换）的8倍。使用文件时，大小仅受磁盘空间或配额的限制，如果没有fdatasync或类似文件，则无需将其设置为磁盘;它可以直接从缓存中再次读取。另一方面，当您的数据很小时，找到放置文件的位置的时间占主导地位，离开管道要快得多。

如果使用fdatasync，或者只是超出可用内存进行磁盘缓存，写入磁盘的速度也会降低，以匹配实际的磁盘传输速度。

Answer 2

因为操作系统内核首先将文件数据首先写入page cache（在RAM中）。

为什么写入文件比mutiprocessing.Pipe更快？

2 个答案: