Question

我想知道如何优化我在Python中实现为web服务的链接检查器。我已经将响应缓存到每24小时到期的数据库。每天晚上我都有一个刷新缓存的cron作业，因此缓存实际上永远不会过时。

当然，如果我截断缓存，或者正在检查页面中有很多链接不在缓存中，那么事情就会变慢。我不是计算机科学家，所以我想要一些建议和一些具体的帮助，如何使用线程或进程来优化它。

我想通过请求每个网址作为后台进程（伪编码）进行优化：

    # The part of code that gets response codes not in cache...
    responses = {}

    # To begin, create a dict of url to process in background
    processes = {}
    for url in urls:
        processes[url] = Popen("curl " url + " &")

    # Now loop through again and get the responses
    for url in processes
        response = processes[url].communicate()
        responses[url] = response

    # Now I have responses dict which has responses keyed by url

对于我的大多数用例，这会将脚本的时间减少至少1/6，而不是仅仅循环遍历每个url并等待每个响应，但是我担心脚本运行会使服务器超载上。我考虑过使用一个队列，每次批量大约25个左右。

多线程会是更好的整体解决方案吗？如果是这样，我将如何使用多线程模块执行此操作？

Python多线程或后台进程？

0 个答案: