Question

我正在尝试捕获给urllib请求的未解析的URL。

import urllib.request

def getSite(url):
    try:
        with urllib.request.urlopen(url, timeout=2) as r:
            print(url, "was resolved!")
    except:
        print(url, "wasn't resolved...")
    return

我希望这会尝试连接到url，如果在2秒内没有响应，它会抛出错误并打印出它没有被解析。如果它在2秒内解决，它会相应地快速响应。这就是我想要发生的事情。我希望每个请求的持续时间不会超过我的处方。

目前，使用有效的网址可以快速响应：

> getSite('http://stackoverflow.com')

> http://stackoverflow.com was resolved!
    real    0m0.449s
    user    0m0.063s
    sys     0m0.063s

但是，使用无效的网址需要的时间超过2秒：

> getSite('http://thisisntarealwebaddress.com')

> http://thisisntarealwebaddress.com wasn't resolved...
    real    0m18.605s
    user    0m0.063s
    sys     0m0.047s

什么是超时参数真正做的，我怎样才能得到我想要的结果？

文档：https://docs.python.org/3.1/library/urllib.request.html

Answer 1

我使用this answer中的run_with_limited_time_function并运行我的函数

解决了这个问题

run_with_limited_time_function(getSite, (url, ), {}, 2)

我仍然希望听到其他人对于timeout为什么不能按照我的预期工作而说的话呢！

复制到这里是为了理智：

def run_with_limited_time(func, args, kwargs, time):
    """Runs a function with time limit

    :param func: The function to run
    :param args: The functions args, given as tuple
    :param kwargs: The functions keywords, given as dict
    :param time: The time limit in seconds
    :return: True if the function ended successfully. False if it was terminated.
    """
    p = Process(target=func, args=args, kwargs=kwargs)
    p.start()
    p.join(time)
    if p.is_alive():
        p.terminate()
        return False

    return True

Answer 2

只需在 urlopen 函数中添加一个超时选项（示例中它等待 10 秒）

file = urllib.request.urlopen("http://www.test.com/resume.pdf", timeout=10)

限制Python3 urllib请求所花费的时间：超时不能按我的预期运行

2 个答案: