Question

如何使用aiohttp在客户端设置每秒的最大请求数（限制它们）？

Answer 1

从v2.0开始，当使用ClientSession时，$ awk -v tgt="address 10.1.104.164" ' /^ltm pool/ { pool=$0; sub(/ *{ *$/,"",pool) } index($0" ",tgt" ") { print pool } ' file ltm pool pool_10.1.105.30_80 ltm pool pool_10.1.105.31_80会自动将同时连接数限制为100。

您可以通过创建自己的TCPConnector并将其传递到aiohttp来修改限制。例如，创建一个限制为50个并发请求的客户端：

ClientSession

如果它更适合您的使用案例，还有一个import aiohttp connector = aiohttp.TCPConnector(limit=50) client = aiohttp.ClientSession(connector=connector)参数（默认情况下是关闭的）您可以传递以限制同时连接的数量＃ 34;端点＆＃34 ;.根据文档：

limit_per_host（ limit_per_host ） - 同时连接到同一端点的限制。如果端点具有相等的int三倍，则端点相同。

使用示例：

(host, port, is_ssl)

Answer 2

我在这里找到了一个可能的解决方案：http://compiletoi.net/fast-scraping-in-python-with-asyncio.html

同时做3个请求很酷，做5000，但是，不太好。如果您尝试同时执行过多请求，则连接可能会开始关闭，或者您甚至可能会被禁止访问该网站。

为避免这种情况，您可以使用信号量。它是一个同步工具，可用于限制在某些时候执行某些操作的协同程序的数量。我们将在创建循环之前创建信号量，并将我们想要允许的同时请求数作为参数传递：

sem = asyncio.Semaphore(5)

然后，我们只需要替换：

page = yield from get(url, compress=True)

同样的事情，但受信号量的保护：

with (yield from sem):
    page = yield from get(url, compress=True)

这将确保最多可以同时完成5个请求。

Answer 3

这是一个没有aiohttp的示例，但是您可以使用aiohttp.request装饰器包装任何异步方法或Limit

import asyncio
import time


class Limit(object):
    def __init__(self, calls=5, period=1):
        self.calls = calls
        self.period = period
        self.clock = time.monotonic
        self.last_reset = 0
        self.num_calls = 0

    def __call__(self, func):
        async def wrapper(*args, **kwargs):
            if self.num_calls >= self.calls:
                await asyncio.sleep(self.__period_remaining())

            period_remaining = self.__period_remaining()

            if period_remaining <= 0:
                self.num_calls = 0
                self.last_reset = self.clock()

            self.num_calls += 1

            return await func(*args, **kwargs)

        return wrapper

    def __period_remaining(self):
        elapsed = self.clock() - self.last_reset
        return self.period - elapsed


@Limit(calls=5, period=2)
async def test_call(x):
    print(x)


async def worker():
    for x in range(100):
        await test_call(x + 1)


asyncio.run(worker())

Answer 4

您可以为每个请求设置一个延迟，或者将URL分为几批，然后限制这些批处理以满足所需的频率。

1。每个请求的延迟时间

使用asyncio.sleep强制脚本在两次请求之间等待

import asyncio
import aiohttp

delay_per_request = 0.5
urls = [
   # put some URLs here...
]

async def app():
    tasks = []
    for url in urls:
        tasks.append(asyncio.ensure_future(make_request(url)))
        await asyncio.sleep(delay_per_request)

    results = await asyncio.gather(*tasks)
    return results

async def make_request(url):
    print('$$$ making request')
    async with aiohttp.ClientSession() as sess:
        async with sess.get(url) as resp:
            status = resp.status
            text = await resp.text()
            print('### got page data')
            return url, status, text

例如，可以使用results = asyncio.run(app())。

2。批量油门

使用上方的make_request，您可以请求和限制一批URL，如下所示：

import asyncio
import aiohttp
import time

max_requests_per_second = 0.5
urls = [[
   # put a few URLs here...
],[
   # put a few more URLs here...
]]

async def app():
    results = []
    for i, batch in enumerate(urls):
        t_0 = time.time()
        print(f'batch {i}')
        tasks = [asyncio.ensure_future(make_request(url)) for url in batch]
        for t in tasks:
            d = await t
            results.append(d)
        t_1 = time.time()

        # Throttle requests
        batch_time = (t_1 - t_0)
        batch_size = len(batch)
        wait_time = (batch_size / max_requests_per_second) - batch_time
        if wait_time > 0:
            print(f'Too fast! Waiting {wait_time} seconds')
            time.sleep(wait_time)

    return results

同样，此操作可以与asyncio.run(app())一起运行。

aiohttp：设置每秒的最大请求数

4 个答案:

1。每个请求的延迟时间

2。批量油门