从标签beautifulsoup python中提取类名

时间:2014-02-06 01:05:21

标签: python html parsing beautifulsoup


    <td class="image">
      <a href="/target/tt0111161/" title="Target Text 1">
       <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
     <td class="title">
      <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
      <a href="/target/tt0111161/">
       Other Text
      <span class="year_type">

我正在尝试使用漂亮的汤将某些元素解析为制表符分隔的文件。 我得到了一些很大的帮助并且有:

for td in soup.select('td.title'):
 span = td.select('span.wlb_wrapper')
 if span:
     print span[0].get('data-tconst') # To get `tt0082971`



for td in soup.select('td.image'): #trying to select the <td class="image"> tag
img = td.select('a.title') #from inside td I now try to look inside the a tag that also has the word title
if img:
    print img[2].get('title') #if it finds anything, then I want to return the text in class 'title'

2 个答案:

答案 0 :(得分:3)

如果你想根据班级得到一个不同的td(即td class =“image”和td class =“title”,你可以使用漂亮的汤作为字典来获得不同的类。

这将在表格中找到所有td class =“image”。

from bs4 import BeautifulSoup

page = """
        <td class="image">
           <a href="/target/tt0111161/" title="Target Text 1">
            <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
          <td class="title">
           <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
           <a href="/target/tt0111161/">
            Other Text
           <span class="year_type">
soup = BeautifulSoup(page)
tbl = soup.find('table')
rows = tbl.findAll('tr')
for row in rows:
    cols = row.find_all('td')
    for col in cols:
        if col.has_attr('class') and col['class'][0] == 'image':
            hrefs = col.find_all('a')
            for href in hrefs:
                print href.get('title')

        elif col.has_attr('class') and col['class'][0] == 'title':
            spans = col.find_all('span')
            for span in spans:
                if span.has_attr('class') and span['class'][0] == 'wlb_wrapper':
                    print span.get('data-tconst')

答案 1 :(得分:0)

span.wlb_wrapper是一个用于选择<span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">的选择器。请参阅this&amp; this了解有关选择器的更多信息

将您的python代码span = td.select('span.wlb_wrapper')更改为span = td.select('span')&amp;也span = td.select('span.year_type')并查看它返回的内容。

