为什么imap和write比apply_async和write更快?

时间:2015-10-26 14:34:51

标签: python python-3.4 python-multiprocessing

imap版本:

import os
import multiprocessing as mp
import timeit
import string
import random


PROCESSES = 5
FILE = 'test_imap.txt'



def remove_file():
    try:
        os.remove(FILE)
    except FileNotFoundError:
        pass


def produce(i):
    return [''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(32)) for i in range(100000)]


def imap_version():
    with mp.Pool(PROCESSES) as p:
        with open(FILE, 'a') as fp:
            for lines in p.imap_unordered(produce, range(5)):
                for line in lines:
                    fp.write(line + '\n')


if __name__ == '__main__':
    remove_file()
    imap_version_result = timeit.repeat("imap_version()", setup="from __main__ import imap_version", repeat=5, number=5)
    print('imap result:', imap_version_result)

apply_async version:

import os
import multiprocessing as mp
import timeit
import string
import random


PROCESSES = 5
FILE = 'test_apply.txt'



def remove_file():
    try:
        os.remove(FILE)
    except FileNotFoundError:
        pass


def produce():
    return [''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(32)) for i in range(100000)]


def worker():
    lines = produce()
    with open(FILE, 'a') as fp:
        for line in lines:
            fp.write(line + '\n')


def apply_version():
    with mp.Pool(PROCESSES) as p:
        processes = []
        for i in range(5):
            processes.append(p.apply_async(worker))

        while True:
            if all((p.ready() for p in processes)):
                break


if __name__ == '__main__':
    remove_file()
    apply_version_result = timeit.repeat("apply_version()", setup="from __main__ import apply_version", repeat=5, number=5)
    print('apply result', apply_version_result)

结果:

imap result: [62.71130559899029, 62.65627204600605, 62.534730065002805, 62.67373917000077, 62.74415319500258]
apply result [72.03727042900573, 72.17959955699916, 72.2304800950078, 72.02653418600676, 72.11620796499483]

我希望imap更慢,因为子进程需要将结果pickle到主进程然后写入文件,而apply_async中的每个子进程都直接将结果写入文件。相反,imapapply_async慢。

为什么会这样?

nb:这是在Mac OS X 10.11上使用Python 3.4.3完成的。

2 个答案:

答案 0 :(得分:1)

快速浏览一下您的源代码,可以看出imap_version()每个进程打开一次输出文件,其中apply_version()每个工作程序打开一次,因为在{{1循环。

range(5)在异步版本中调用125次,在imap版本中调用25次。

答案 1 :(得分:0)

我的猜测是繁忙的循环是罪魁祸首(除了它本身就是反模式)。

通过自己检查状态,你做了多余的工作:multiprocessing的机制与幕后的工作队列(在multiprocessing.pool.Pool._handle_workers()中运行在一个单独的线程中)几乎完全相同。另一方面,IMapIterator.next使用threading.Condition(threading.Lock())暂停主线程的执行,直到项准备就绪(因此_handle_workers不受阻碍地运行 - 请记住每个时刻只有一个线程可以运行Python代码)

无论如何,这只是另一种猜测。唯一的决定性证据是profiling结果。