我使用urllib.request持续收到HTTP 400错误请求错误?

时间:2018-11-17 20:27:57

标签: python beautifulsoup urllib

所以我现在已经坐在这个问题上了一段时间,在下面运行代码时,我一直收到错误的请求错误。

url = input("Twitter link: ")
print("\n")
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').a.text
location = soup.find('span', {'class' : 'ProfileHeaderCard- 
locationText'}).text
locationstrip = location.strip()
created = soup.find('span', {'class' : 'ProfileHeaderCard- 
joinDateText'}).text
birthday = soup.find('span', {'class' : 'ProfileHeaderCard- 
birthdateText'}).text
birthdaystrip = birthday.strip()
posted = soup.find('a', {'class' : 'PhotoRail-headingWithCount'}).text
postedstrip = posted.strip()

print("Info")
print("-------- \n")
print(name)
print(locationstrip)
print(created)
print(birthdaystrip)
 print(postedstrip)
url = "http://www.wikipedia.com/wiki/" + name
formedurl = urllib.request.Request(url, headers={'User-Agent': 'Chrome/70.0.3538.102'})
html_doc = urllib.request.urlopen(formedurl)
soup = BeautifulSoup(html_doc, 'lxml')

我读到您需要指定一个用户代理,所以我确实使它看起来像合法的HTTP请求,但仍然出现此错误。预先感谢

2 个答案:

答案 0 :(得分:1)

您需要在space中将_替换为name的下划线{p}

name = name.replace(' ', '_')
url = "http://www.wikipedia.com/wiki/" + name

答案 1 :(得分:0)

从浏览器中复制标题,然后将它们一个接一个地删除,直到找到最小的标题集为止。

相关问题