使用BeautifulSoup4解析数据

时间:2016-11-02 21:42:15

标签: python web-scraping beautifulsoup

import requests
from bs4 import BeautifulSoup

request = requests.get("http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default")
content = request.content
soup = BeautifulSoup(content, "html.parser")
team_name = soup.findAll('text', {'class': 'team-name'})

print(team_name)

我正在尝试解析来自网址的数据:“http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default”。 <text class="team-name">SK Telecom T1</text>下是个人团队名称。我想要做的是解析数据(SK Telecom T1)并将其打印到屏幕上,但我得到[]一个空列表。我做错了什么?

2 个答案:

答案 0 :(得分:2)

网站依赖于javascript来加载。请求不解释JS,因此无法解析数据。

对于像这样的网站,Selenium会更好。它使用Firefox(或其他驱动程序)作为整个网站的解释器,包括JS。

答案 1 :(得分:2)

您不需要selenium,所有动态内容都可以通过简单的get http://api.lolesports.com/api/v1/leagues请求以json格式检索:

import requests

data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()

它为您提供了大量数据,您想要的内容似乎都在data["teams"]下。其摘录如下:

[{'id': 2, 'slug': 'bangkok-titans', 'name': 'Bangkok Titans', 'teamPhotoUrl': 'http://na.lolesports.com/sites/default/files/BKT_GPL.TMPROFILE_0.png', 'logoUrl': 'http://assets.lolesports.com/team/bangkok-titans-597g0x1v.png', 'acronym': 'BKT', 'homeLeague': 'urn:rg:lolesports:global:league:league:12', 'altLogoUrl': None, 'createdAt': '2014-07-17T18:34:47.000Z', 'updatedAt': '2015-09-29T16:09:36.000Z', 'bios': {'en_US': 'The Bangkok Titans are the undisputed champions of Thailand’s League of Legends esports scene. They achieved six consecutive 1st place finishes in the Thailand Pro League from 2014 to 2015. However, they aren’t content with just domestic domination.

如果dicts:

,每个团队都列在列表中
In [1]: import requests


In [2]: data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()


In [3]: for d in data["teams"]:
   ...:         print(d["name"])
   ...:     
Bangkok Titans
ahq e-Sports Club
SK Telecom T1
TSM
Fnatic
Cloud9 
Counter Logic Gaming
H2K
Edward Gaming
INTZ e-Sports
paiN Gaming
Origen
LGD Gaming
Invictus Gaming
Royal Never Give Up
Flash Wolves
Splyce
Samsung Galaxy
KT Rolster
ROX Tigers
G2 Esports
I May
Albus NoX Luna