网页搜罗-transfermarkt最有价值的球员

时间:2018-12-01 20:20:59

标签: python beautifulsoup python-requests

我是网络爬虫的新手。

在此代码中我找不到我的错:

import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler- 
statistik/wertvollstespieler/marktwertetop"
response=requests.get(url)
html_icerigi=response.content
soup=BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
   footballer=footballer.text
    footballer=footballer.strip()
    footballer=footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)

3 个答案:

答案 0 :(得分:2)

它可以与'一起使用,这就是问题所在

  1. 它具有防刮擦功能,需要设置请求用户代理

  2. 附加了
  3. DROP TABLE IF EXISTS Profiles; CREATE TABLE IF NOT EXISTS Profiles (Username TEXT PRIMARY KEY,Password TEXT DEFAULT 'UNSETPASSWORD'); INSERT OR IGNORE INTO Profiles (Username) VALUES ('ht001'),('ht950'),('ht999'); SELECT * FROM Profiles; UPDATE Profiles SET Password = '|a¡è~©jÃQZ!ëg! (ªBìSóûÌõ»vî' WHERE UserName = 'ht999'; SELECT * FROM Profiles; 类 您可以动态删除它。

  4. 使用BeautifulSoup而不是转义字符串tooltipstered

  5. 您要迭代的是空列表,而不是response.text元素的列表

    response.content
  6. 不需要的多行变量重写,可能是错误的列表树,您的意思是想 追加字典而不是

    a

固定代码:

footballer_list=[]
for footballer in footballer_list:

结果:

[['Futbolcu:Kylian Mbappé'], ......, ['Futbolcu:Marlon Freitas']]

答案 1 :(得分:1)

安装Selenium,然后以这种方式访问​​它。否则,您的代码似乎可以正常工作

import bs4 
from selenium import webdriver 

browser = webdriver.Chrome()
browser.get('https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop')

html_icerigi = browser.page_source

soup = bs4.BeautifulSoup(html_icerigi,"html.parser")

footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]

for footballer in footballer_list:
    footballer=footballer.text
    footballer=footballer.strip()
    footballer=footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer)

browser.close()  

输出:

[<a class="spielprofil_tooltip tooltipstered" href="/kylian-mbappe/profil/spieler/342229" id="342229">Kylian Mbappé</a>, <a class="spielprofil_tooltip tooltipstered" href="/neymar/profil/spieler/68290" id="68290">Neymar</a>, <a class="spielprofil_tooltip tooltipstered" href="/lionel-messi/profil/spieler/28003" id="28003">Lionel Messi</a>, <a class="spielprofil_tooltip tooltipstered" href="/mohamed-salah/profil/spieler/148455" id="148455">Mohamed Salah</a>, <a...

答案 2 :(得分:1)

selenium外,您还可以使用requests_html来呈现页面。尽管您在问为什么没有获得任何收益,但是您的for-loop是错误的。这意味着即使您已经运行了JavaScript并获得了完整的html代码,您最终还是会得到空的footballer_list

import requests_html
from bs4 import BeautifulSoup

url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
with requests_html.HTMLSession() as s:
    resp = s.get(url)
    resp.html.render()
    page = resp.html.raw_html


soup = BeautifulSoup(page,"html.parser")
footballer_all = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})

footballer_list = []

for footballer in footballer_all:
    footballer = footballer.text
    footballer = footballer.strip()
    footballer = footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])

print(footballer_list)