Question

我在计算机科学实验室（LIRIS）实习3个月。我的实习主管要求我在meilleurs-agents.com上检索一些数据。这是一个房地产网站，我想检索每个城市的平方米价格。我的程序是在Python中，我实际上尝试发送多个请求来获取数据。但由于代理错误，它无法正常工作：

HTTPConnectionPool(host='XXXXXX', port=XXXX): Max retries exceeded with url: "..." (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000000000B304320>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',)))

我的代码预览：

headers = requests.utils.default_headers()
headers.update({
    'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
})  
for city, postal_code in zip(cities, postal_codes):
    url = 'https://www.meilleursagents.com/prix-immobilier/'+city+'-'+postal_code+'/'

    PROXY = {'https' : 'XX.XXX.X.XXX:XXXX'}

    try:
        response = requests.get(url, timeout=10, proxies=PROXY)
    except Exception as e :
        print(e)

如果我删除代理，我的请求有效，但html代码包含类似＆＃34的消息;您似乎是机器人，因此您的请求尚未完成＆＃34;所以我无法获得价格......但我真的需要这些数据

希望我的问题很明确，有人可以帮助我:)。

谢谢，耐莉

PS：对不起我的英语，我是法国学生：D

Answer 1

尝试为您的请求更改User-Agent标题和Cookie。

另一种解决方法是尝试在请求之间添加一些超时：

time.sleep(1)  # try to use different time values

这当然会减慢你的脚本速度，但可能有助于避免过多的请求错误。

如何在Python中向网站发送多个获取请求？

1 个答案: