如何使用python 3.5.2从标签html获取属性值

时间:2016-09-09 15:32:44

标签: python html beautifulsoup

嗨我遇到python 3.5.2的问题 当我想获得属性的值时,我不知道问题出在哪里得到所有标签(属性+值),但我只想要标题的值? 这是我的代码

from bs4 import BeautifulSoup as bs
import requests 

url = "http://bestofgeeks.com/en/"
html = requests.get(url).text
soup = bs(html,'html.parser')

tagss = soup.findAll('a',{'class':'titre_post'})
print(tagss)

我得到了这个

[<a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Last-Technology&amp;name=854&amp;title=Apple-Watch-Series-2-Waterproof-50-meters-with-Pokemon-Go" hreflang="en" rel="tag" titre="Apple Watch Series 2 Waterproof 50 meters with Pokemon Go">
Apple Watch Series 2 Waterproof 50 meters with Pokemon Go      </a>, <a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Security&amp;name=853&amp;title=Warning-This-Cross-Platform-Malware-Can-Hack-Windows-Linux-and-OS-X-Computers" hreflang="en" rel="tag" titre="Warning This Cross Platform Malware Can Hack Windows Linux and OS X Computers">
Warning This Cross Platform Malware Can Hack Windows Linux and OS X Computers      </a>, <a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Games&amp;name=852&amp;title=PS4-Slim-Announced,-Launching-This-Month-coming-september-15-for-299$-" hreflang="en" rel="tag" titre="PS4 Slim Announced, Launching This Month coming september 15 for 299$ ">
PS4 Slim Announced, Launching This Month coming september 15 for 299$       </a>, <a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Last-Technology&amp;name=851&amp;title=Sony-New-IFA-products" hreflang="en" rel="tag" titre="Sony New IFA products">
Sony New IFA products      </a>, <a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Phone&amp;name=850&amp;title=This-is-the-iPhone-7-waterproofing,-stereo-speakers,-and-dual-cameras" hreflang="en" rel="tag" titre="This is the iPhone 7 waterproofing, stereo speakers, and dual cameras">
This is the iPhone 7 waterproofing, stereo speakers, and dual cameras      </a>, <a charset="UTF-8" class="titre_post" href="article_to_read.php?category=Security&amp;name=849&amp;title=Russia-is-Largest-Portal-HACKED;-Nearly-100-Million-Plaintext-Passwords-Leaked" hreflang="en" rel="tag" titre="Russia is Largest Portal HACKED; Nearly 100 Million Plaintext Passwords Leaked">
Russia is Largest Portal HACKED; Nearly 100 Million Plaintext Passwords Leaked      </a>]

2 个答案:

答案 0 :(得分:0)

如果您只想要“a”标签中的文字,因为您的所有网络链接都存储在tagss中,只需按照以下所示进行迭代和打印:

for t in tagss:
  print t.text.strip()

答案 1 :(得分:0)

如果您想要titre属性的内容:

tagss = [tag.get('titre') for tag in soup.findAll('a',{'class':'titre_post'})]