当我调用process.join时,如何运行脚本?

时间:2017-08-27 16:10:57

标签: python python-3.x multiprocessing

我有一些进程可以在while循环中运行。我基本上有一些收集数据的进程,在它们停止之前,我希望它们将数据保存到csv或json文件中。我现在所拥有的是使用super函数覆盖multiprocessing.Process类中的join方法。

class Processor(multiprocessing.Process):
    def __init__(self, arguments):
        multiprocessing.Process.__init__(self)

    def run(self):
        self.main_function()

    def main_function(self):
        While True:
            #do things to incoming data

    def function_on_join(self):
        #do one last thing before the process ends

    def join(self, timeout=None):
        self.function_on_join()
        super(Processor, self).join(timeout=timeout)

有更好的方式/正确方式/更多pythonic方式来做到这一点?

1 个答案:

答案 0 :(得分:1)

我建议您查看concurrent.futures模块。

如果您可以将您的工作描述为一组工作人员要完成的任务列表。

基于任务的多处理

当您有一系列jobs(例如文件名列表)并希望它们并行处理时 - 您可以按以下方式执行此操作:

from concurrent.futures import ProcessPoolExecutor    
import requests

def get_url(url):
    resp = requests.get(url)
    print(f'{url} - {resp.status_code}')
    return url

jobs = ['http://google.com', 'http://python.org', 'http://facebook.com']

# create process pool of 3 workers
with ProcessPoolExecutor(max_workers=1) as pool:
    # run in parallel each job and gather the returned values
    return_values = list(pool.map(get_url, jobs))

print(return_values)

输出:

http://google.com - 200
http://python.org - 200
http://facebook.com - 200
['http://google.com', 'http://python.org', 'http://facebook.com']

非基于任务的多处理

当您只想运行多个不消耗第一种情况的作业的子进程时,您可能希望使用multiprocessing.Process

您可以以程序方式和OOP方式与threading.Thread类似地使用它。

程序性时尚的例子(恕我直言更多pythonic):

import os
from multiprocessing import Process

def func():
    print(f'hello from: {os.getpid()}')

processes = [Process(target=func) for _ in range(4)]  # creates 4 processes

for process in processes:
    process.daemon = True  # close the subprocess if the main program closes
    process.start()  # start the process

输出:

hello from: 31821
hello from: 31822
hello from: 31823
hello from: 31824

等待进程完成

如果您想使用Process.join()this SO answer上的process.join()& process.daemon上的更多信息),请执行以下操作:

import os
import time
from multiprocessing import Process

def func():
    time.sleep(3)
    print(f'hello from: {os.getpid()}')

processes = [Process(target=func) for _ in range(4)]  # creates 4 processes

for process in processes:
    process.start()  # start the process

for process in processes:
    process.join()  # wait for the process to finish

print('all processes are done!')

此输出:

hello from: 31980
hello from: 31983
hello from: 31981
hello from: 31982
all processes are done!