我正在使用漂亮的汤来尝试解析网页中的信息:
url='https://www.onthemarket.com/for-sale/2-bed-flats-apartments/shortlands-station/?max-bedrooms=&radius=0.5'
req=requests.get(url)
要求返回<Response [403]>
Python requests. 403 Forbidden提示存在用户代理问题,但在我的实例中找不到。
有什么建议
答案 0 :(得分:0)
在这种情况下,请使用包含user-agent
from bs4 import BeautifulSoup
import requests
url = 'https://www.onthemarket.com/for-sale/2-bed-flats-apartments/shortlands-station/?max-bedrooms=&radius=0.5'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}
html_page = requests.get(url, headers=headers).text
soup = BeautifulSoup(html_page, "html.parser")
print(soup.text)