我无法打开存在的网站

时间:2018-02-02 13:41:32

标签: python web-scraping

我收到的错误让我相信我的程序无法找到我认识的网站。该网站是

https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207

我的代码看起来像

from urllib import request as u_r

def strip_webite():

  with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
      pass

if __name__ == "__main__":
  strip_webite()

我得到的错误是

  File "database_management.py", line 19, in <module>
    strip_webite()
  File "database_management.py", line 15, in strip_webite
    with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 404: Not Found

1 个答案:

答案 0 :(得分:2)

看起来Transfermarkt正在使用Python的urllib库发送的默认from urllib import request as u_r def strip_webite(): request = u_r.Request("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") request.add_header('User-Agent', 'my-cool-app') with u_r.urlopen(request) as f: pass if __name__ == "__main__": strip_webite() 字符串阻止来自机器人的请求,尽管它在{{{}}中没有提及任何相关信息。 3}}

这似乎意味着他们不介意我们抓他们,但他们更愿意我们宣布我们是谁。

要使用urllib执行此操作,请执行以下操作:

<bean id="ProcessorRef" class="com.healthedge.customer.THC.extractor.ProcessorClass">

<bean ref="ProcessorRef" method="whatAmI('your_parameter_here')" />