Question

我是网络抓取的新手，我正在尝试从网站上抓取风数据。这是网站：https://wx.ikitesurf.com/spot/507。我知道我可以使用 selenium 来查找元素，但我想我可能找到了更好的方法。如果我错了，请纠正。在开发者工具中，我可以通过转到 network->JS->getGraph 找到此页面？

https://api.weatherflow.com/wxengine/rest/graph/getGraph?callback=jQuery17200020271765600428093_1619158293267&units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881

此页面包含我需要的所有数据，并且会不断更新。这是我的代码：

url = 'https://api.weatherflow.com/wxengine/rest/graph/getGraph?callback=jQuery17200020271765600428093_1619158293267&units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
time.sleep(3)
wind = soup.find("last_ob_wind_desc")
print (wind)

我尝试用漂亮的汤来刮，但我总是收到“无”的答案。有谁知道我怎么能刮这个页面？我想知道我做错了什么。感谢您的帮助！

Answer 1

从 callback=jQuery17200020271765600428093_1619158293267& url 中删除 api 将使其返回正确的 json：

import requests

url = 'https://api.weatherflow.com/wxengine/rest/graph/getGraph?units_wind=mph&units_temp=f&units_distance=mi&fields=wind&format=json&null_ob_min_from_now=60&show_virtual_obs=true&spot_id=507&time_start_offset_hours=-36&time_end_offset_hours=0&type=dataonly&model_ids=-101&wf_token=3a648ec44797cbf12aca8ebc6c538868&_=1619158293881'
response = requests.get(url).json()

response 现在是包含数据的字典。 last_ob_wind_desc 可以用 response['last_ob_wind_desc'] 检索。

您还可以使用 csv 将数据保存为 pandas 或其他文件格式：

import pandas as pd

df = pd.json_normalize(response)
df.to_csv('filename.csv')

用美丽的汤刮网页

1 个答案: