Question

说我有scrapper_1.py，scrapper_2.py，scrapper_3.py。

我现在从pycharm中运行它的方式分别运行/执行，这样我就可以在任务管理器中看到正在执行的3 python.exe。

现在，我正在尝试编写一个主脚本，例如scrapper_runner.py，该脚本将这些scrapers导入为模块，并以非顺序的方式并行运行它们。

我尝试了来自各种SO帖子的带有子进程，多处理甚至os.system的示例...但是没有任何运气...从日志中它们都按顺序运行，并且从任务管理器中我只能看到一个python.exe执行。

这是这种过程的正确模式吗？

EDIT：1 （尝试并发。将来使用ProcessPoolExecutor）。

from concurrent.futures import ProcessPoolExecutor

import scrapers.scraper_1 as scraper_1
import scrapers.scraper_2 as scraper_2
import scrapers.scraper_3 as scraper_3

## Calling method runner on each scrapper_x to kick off processes
runners_list = [scraper_1.runner(), scraper_1.runner(), scraper_3.runner()]



if __name__ == "__main__":


    with ProcessPoolExecutor(max_workers=10) as executor:
        for runner in runners_list:
            future = executor.submit(runner)
            print(future.result())

Answer 1

取决于您的操作系统和任务管理器，python中的子进程可能会或可能不会显示为单独的进程。例如，在Linux中，htop将在树形视图的父进程下显示子进程。

我建议您在python的multiprocessing模块上深入了解此教程：https://pymotw.com/2/multiprocessing/basics.html

但是，如果python内置的多处理/线程化方法不起作用或对您没有意义，则可以使用bash调用python脚本来达到所需的结果。以下bash脚本生成了所附的屏幕截图。

#!/bin/sh
./py1.py &
./py2.py &
./py3.py &

说明：每次调用结束时的&告诉bash将每个调用作为后台进程运行。

Answer 2

您的问题在于如何设置流程。即使您认为自己在运行，也不会并行运行这些进程。将它们添加到runners_list时，您实际上是在运行它们，然后将每个运行程序的结果作为多进程并行运行。

您想要做的是将功能添加到runners_list而不执行它们，然后在多处理pool中执行它们。实现此目的的方法是添加功能引用，即功能名称。为此，您不应该包括括号，因为这是调用函数的语法，而不仅仅是命名它们。

此外，要使期货异步执行，不可能直接调用future.result，因为这将强制代码按顺序执行，以确保结果在相同的顺序中可用。函数被调用。

这意味着解决问题的灵魂是

from concurrent.futures import ProcessPoolExecutor

import scrapers.scraper_1 as scraper_1
import scrapers.scraper_2 as scraper_2
import scrapers.scraper_3 as scraper_3

## NOT calling method runner on each scrapper_x to kick off processes
## Instead add them to the list of functions to be run in the pool
runners_list = [scraper_1.runner, scraper_1.runner, scraper_3.runner]

# Adding callback function to call when future is done.
# If result is not printed in callback, the future.result call will
# serialize the call sequence to ensure results in order
def print_result(future):
    print(future.result)

if __name__ == "__main__":
    with ProcessPoolExecutor(max_workers=10) as executor:
        for runner in runners_list:
            future = executor.submit(runner)
            future.add_done_callback(print_result)

如您所见，此处创建列表时，不会发生跑步者的调用，但是稍后，将runner提交给执行者时，不会发生跑步者的调用。并且，当结果准备就绪时，将调用回调，以将结果打印到屏幕上。

执行并行的.py脚本

2 个答案: