我正在尝试在我的8位64位Windows 7计算机上完成100次模型运行。我想同时运行7个模型实例以减少我的总运行时间(每个模型运行大约9.5分钟)。我查看了几个与Python的Multiprocessing模块有关的线程,但我仍然遗漏了一些东西。
Using the multiprocessing module
How to spawn parallel child processes on a multi-processor system?
我的流程:
我有100个不同的参数集,我想通过SEAWAT / MODFLOW来比较结果。我为每个模型运行预先构建了模型输入文件,并将它们存储在自己的目录中。我希望能够做的是一次运行7个模型,直到完成所有实现。不需要在进程之间进行通信或显示结果。到目前为止,我只能按顺序生成模型:
import os,subprocess
import multiprocessing as mp
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
files = []
for f in os.listdir(ws + r'\fieldgen\reals'):
if f.endswith('.npy'):
files.append(f)
## def work(cmd):
## return subprocess.call(cmd, shell=False)
def run(f,def_param=ws):
real = f.split('_')[2].split('.')[0]
print 'Realization %s' % real
mf2k = r'c:\modflow\mf2k.1_19\bin\mf2k.exe '
mf2k5 = r'c:\modflow\MF2005_1_8\bin\mf2005.exe '
seawatV4 = r'c:\modflow\swt_v4_00_04\exe\swt_v4.exe '
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
exe = seawatV4x64
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
os.system( exe + swt_nam )
if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
tasks = range(len(files))
results = []
for f in files:
r = p.map_async(run(f), tasks, callback=results.append)
我将if __name__ == 'main':
更改为以下内容,希望它能解决我认为for loop
在上述脚本中传达的缺乏并行性的问题。但是,模型甚至无法运行(没有Python错误):
if __name__ == '__main__':
p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes
p.map_async(run,((files[f],) for f in range(len(files))))
非常感谢任何和所有帮助!
编辑3/26/2012 13:31 EST
使用@ J.F中的“手动池”方法。塞巴斯蒂安在下面的回答我得到了我的外部.exe的并行执行。模型实现一次批量调用8个,但是在调用下一个批次之前不等待那8个运行完成,依此类推:
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam])
def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(8)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion
if __name__ == '__main__':
mp.freeze_support() # optional if the program is not frozen
main()
没有错误回溯可用。 run()
函数在调用单个模型实现文件时执行其职责,与多个文件一样。唯一的区别是,对于多个文件,它被称为len(files)
次,但每个实例立即关闭,只允许一个模型运行完成,此时脚本正常退出(退出代码0)。
向main()
添加一些打印语句会显示有关活动线程计数和线程状态的一些信息(请注意,这只是对8个实现文件的测试,以使屏幕截图更易于管理,理论上所有8文件应该同时运行,但是行为会在它们产生的地方继续,并且除了一个之外立即死掉:
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\test')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count())]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
print('Active Count a',threading.activeCount())
for _ in threads:
print(_)
q.put_nowait(None) # signal no more files
for t in threads:
print(t)
t.join() # wait for completion
print('Active Count b',threading.activeCount())
**读取“D:\\Data\\Users...
”的行是手动停止模型运行完成时抛出的错误信息。一旦我停止模型运行,就会报告剩余的线程状态行并退出脚本。
EDIT 3/26/2012 16:24 EST
SEAWAT确实允许并发执行,因为我过去已经这样做了,使用iPython手动生成实例并从每个模型文件夹启动。这一次,我将从一个位置启动所有模型运行,即我的脚本所在的目录。看起来罪魁祸首可能是SEAWAT节省一些输出的方式。运行SEAWAT时,它会立即创建与模型运行相关的文件。其中一个文件未保存到模型实现所在的目录中,而是保存在脚本所在的顶级目录中。这可以防止任何后续线程在同一位置保存相同的文件名(他们都希望这样做,因为这些文件名是通用的,并且对每个实现都是非特定的)。 SEAWAT窗口没有保持打开足够长的时间让我阅读甚至看到有错误消息,我只是在我回去并尝试使用iPython运行代码时才意识到这一点,它直接显示来自SEAWAT的打印输出而不是打开一个运行程序的新窗口。
我接受@ J.F. Sebastian的回答是,一旦我解决了这个模型可执行问题,他提供的线程代码就会让我得到我需要的位置。
最终代码
在subprocess.check_call中添加了cwd参数,以在其自己的目录中启动SEAWAT的每个实例。很关键。
from __future__ import print_function
import os,subprocess,sys
import multiprocessing as mp
from Queue import Queue
from threading import Thread
import threading
def run(f,ws):
real = f.split('_')[-1].split('.')[0]
print('Realization %s' % real)
seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe '
cwd = ws + r'\reals\real%s\ss' % real
swt_nam = ws + r'\reals\real%s\ss\ss.nam_swt' % real
subprocess.check_call([seawatV4x64, swt_nam],cwd=cwd)
def worker(queue):
"""Process files from the queue."""
for args in iter(queue.get, None):
try:
run(*args)
except Exception as e: # catch exceptions to avoid exiting the
# thread prematurely
print('%r failed: %s' % (args, e,), file=sys.stderr)
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
q = Queue()
for f in os.listdir(wdir):
if f.endswith('.npy'):
q.put_nowait((os.path.join(wdir, f), ws))
# start threads
threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count()-1)]
for t in threads:
t.daemon = True # threads die if the program dies
t.start()
for _ in threads: q.put_nowait(None) # signal no more files
for t in threads: t.join() # wait for completion
if __name__ == '__main__':
mp.freeze_support() # optional if the program is not frozen
main()
答案 0 :(得分:15)
我没有在Python代码中看到任何计算。如果你只需要并行执行几个外部程序就可以使用subprocess
运行程序和threading
模块来维持运行的常数进程,但最简单的代码是使用{{1} }:
multiprocessing.Pool
如果文件很多,则#!/usr/bin/env python
import os
import multiprocessing as mp
def run(filename_def_param):
filename, def_param = filename_def_param # unpack arguments
... # call external program on `filename`
def safe_run(*args, **kwargs):
"""Call run(), catch exceptions."""
try: run(*args, **kwargs)
except Exception as e:
print("error: %s run(*%r, **%r)" % (e, args, kwargs))
def main():
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
workdir = os.path.join(ws, r'fieldgen\reals')
files = ((os.path.join(workdir, f), ws)
for f in os.listdir(workdir) if f.endswith('.npy'))
# start processes
pool = mp.Pool() # use all available CPUs
pool.map(safe_run, files)
if __name__=="__main__":
mp.freeze_support() # optional if the program is not frozen
main()
可以替换为pool.map()
。
还有for _ in pool.imap_unordered(safe_run, files): pass
提供与mutiprocessing.dummy.Pool
相同的接口,但使用线程而不是在这种情况下可能更合适的进程。
您不需要保留一些CPU空闲。只需使用以低优先级启动可执行文件的命令(在Linux上它是multiprocessing.Pool
程序)。
ThreadPoolExecutor
example concurrent.futures.ThreadPoolExecutor
既简单又充足,但需要3rd-party dependency on Python 2.x(自Python 3.2起,它就在stdlib中)。
nice
或者,如果我们忽略#!/usr/bin/env python
import os
import concurrent.futures
def run(filename, def_param):
... # call external program on `filename`
# populate files
ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a'
wdir = os.path.join(ws, r'fieldgen\reals')
files = (os.path.join(wdir, f) for f in os.listdir(wdir) if f.endswith('.npy'))
# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
future_to_file = dict((executor.submit(run, f, ws), f) for f in files)
for future in concurrent.futures.as_completed(future_to_file):
f = future_to_file[future]
if future.exception() is not None:
print('%r generated an exception: %s' % (f, future.exception()))
# run() doesn't return anything so `future.result()` is always `None`
引发的异常:
run()
from itertools import repeat
... # the same
# start threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
executor.map(run, files, repeat(ws))
# run() doesn't return anything so `map()` results can be ignored
+ subprocess
(手动池)解决方案threading
答案 1 :(得分:1)
这是我维护内存中最小x个线程数的方法。它结合了线程和多处理模块。对于其他技术,如受人尊敬的其他成员已经解释过,可能是不寻常的,但可能是值得的。为了便于解释,我采用了一次抓取至少5个网站的方案。
所以这里是: -
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
pass
接下来是threadController函数。此函数将控制到主存储器的线程流。它将继续激活线程以维持threadNum"最小值"限制即。 5.它也不会退出,直到所有活动线程(acitveCount)都完成。
它将保持最少的threadNum(5)startProcess函数线程(这些线程最终将从processList启动进程,同时加入时间超过60秒)。在启动threadController之后,会有2个线程不包含在上面的5个限制中。主线程和threadController线程本身。这就是为什么使用threading.activeCount()!= 2。
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess函数作为一个单独的线程,将从进程列表中启动进程。这个函数的目的(**作为一个不同的线程开始)是它将成为Processes的父线程。因此,当它以60秒的超时加入它们时,这将阻止startProcess线程向前移动,但这不会停止执行threadController。所以这样,threadController将按需运行。
def startProcess():
pr = processList.pop(0)
pr.start()
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.start()
mt.join() # won't let go next until threadController thread finishes.
output.close()
print "Done"
除了在内存中保持最小线程数之外,我的目标是还可以避免内存中的线程或进程被卡住。我是使用超时功能完成的。 我为任何打字错误道歉。
我希望这个建筑可以帮助这个世界上的任何人。 问候, Vikas Gautam