"懒" asyncio.gather的版本?

时间:2018-01-02 18:46:18

标签: python python-asyncio

我使用Python的asyncio模块和async / await同时处理块中的字符序列,并将结果收集到列表中。为此,我使用了一个chunker函数(split)和一个块处理函数(process_chunk)。它们都来自第三方库,我宁愿不改变它们。

分块很慢,并且前面不知道块的数量,这就是为什么我不想立刻使用整个块生成器。理想情况下,代码应该使生成器与process_chunk的信号量同步,即每次该函数返回时。

我的代码

import asyncio

def split(sequence):
    for x in sequence:
        print('Getting the next chunk:', x)
        yield x
    print('Finished chunking')

async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
    async with semaphore:
        print('Processing chunk:', chunk)
        await asyncio.sleep(3)
        return 'OK'

async def process_in_chunks(sequence):
    gen = split(sequence)
    coro = [process_chunk(chunk) for chunk in gen]
    results = await asyncio.gather(*coro)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(process_in_chunks('ABC'))

有点作品和版画

Getting the next chunk: A
Getting the next chunk: B
Getting the next chunk: C
Finished chunking
Processing chunk: C
Processing chunk: B
Processing chunk: A

虽然这意味着在处理开始之前gen生成器已耗尽。我知道为什么会这样,但是如何改变呢?

2 个答案:

答案 0 :(得分:4)

如果您不介意有外部依赖,可以使用aiostream.stream.map

from aiostream import stream, pipe

async def process_in_chunks(sequence):
    # Asynchronous sequence of chunks
    xs = stream.iterate(split(sequence))
    # Asynchronous sequence of results
    ys = xs | pipe.map(process_chunk, task_limit=2)
    # Aggregation of the results into a list
    zs = ys | pipe.list()
    # Run the stream
    results = await zs
    print(results)

这些块被懒惰地生成并馈送到process_chunk协程。同时运行的协同程序的数量由task_limit控制。这意味着process_chunk中的信号量不再是必需的。

输出:

Getting the next chunk: A
Processing chunk: A
Getting the next chunk: B
Processing chunk: B
# Pause 3 seconds
Getting the next chunk: C
Processing chunk: C
Finished chunking
# Pause 3 seconds
['OK', 'OK', 'OK']

请参阅此demonstrationdocumentation中的更多示例。

答案 1 :(得分:2)

  • 使用next手动迭代gen
  • 在获取和处理块之前获取信号量
  • 处理夹头后释放信号量

import asyncio


# third-party:
def split(sequence):
    for x in sequence:
        print('Getting the next chunk:', x)
        yield x
    print('Finished chunking')


async def process_chunk(chunk, *, semaphore=asyncio.Semaphore(2)):
    async with semaphore:
        print('Processing chunk:', chunk)
        await asyncio.sleep(3)
        return 'OK'


# our code:
sem = asyncio.Semaphore(2)  # let's use our semaphore


async def process_in_chunks(sequence):    
    tasks = []
    gen = split(sequence)
    while True:
        await sem.acquire()
        try:
            chunk = next(gen)
        except StopIteration:
            break
        else:
            task = asyncio.ensure_future(process_chunk(chunk))  # task to run concurently
            task.add_done_callback(lambda *_: sem.release())  # allow next chunks to be processed
            tasks.append(task)
    await asyncio.gather(*tasks, return_exceptions=True)  # await all pending task
    results = [task.result() for task in tasks]
    return results


if __name__ ==  '__main__':
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(process_in_chunks('ABCDE'))
    finally:
        loop.run_until_complete(loop.shutdown_asyncgens())
        loop.close()

输出:

Getting the next chunk: A
Getting the next chunk: B
Processing chunk: A
Processing chunk: B
Getting the next chunk: C
Getting the next chunk: D
Processing chunk: C
Processing chunk: D
Getting the next chunk: E
Finished chunking
Processing chunk: E