网址过多的异步请求脚本崩溃了

时间:2019-02-18 13:36:14

标签: python python-asyncio aiohttp

我正在尝试检查一组URL的状态代码,并返回所有4xx或5xx的代码。 总共我需要检查大约12500个URL,我的脚本可以正常运行多达7000个URL。除此之外,脚本因ResourceWarning未封闭的传输错误而崩溃。

我正在使用python-3.6和aiohttp 3.5.4

你知道是什么原因造成的吗?

async def fetch(url, session):
    async with session.get(url) as response:
        data = response.status
        return url, data


async def bound_fetch(sem, url, session):
    async with sem:
        return await fetch(url, session)


async def check_urls(url_list):
    ''' get status code for all urls and write into dictionary '''

    base_url = <base_url>   
    tasks = []

    sem = asyncio.Semaphore(10)

    async with ClientSession() as session:
        for url in url_list:
            full_url = base_url + url
            task = asyncio.ensure_future(bound_fetch(sem, full_url.format(), session))
            tasks.append(task)

        results = await asyncio.gather(*tasks)
        results_dict = defaultdict(list)
        for res in results:
            if res[1] != 200 and res[1] != 301 and res[1] != 302:
                print(f'ERROR {str(res[1])} {res[0]}')
                results_dict[res[1]].append(res[0])

        print(f'URLs checked, found {str(len(results_dict))} errors')


''' main function'''
loop = asyncio.get_event_loop()
loop.set_debug(True)
warnings.simplefilter('always', ResourceWarning)
future = asyncio.ensure_future(check_urls(list_of_urls))
loop.run_until_complete(future)

0 个答案:

没有答案
相关问题