BeautifulSoup,1个元素具有2个相同的链接,如何仅打印1个?

时间:2018-11-19 16:55:48

标签: python beautifulsoup findall

嗨, 在我运行以下代码后:

import requests
from bs4 import BeautifulSoup


page = requests.get('https://coinpaprika.com')
soup = BeautifulSoup(page.text, 'html.parser')

coin_list = soup.find('tbody')
coin_list_items = coin_list.find_all('a')

for coin_name in coin_list_items:
    names = coin_name.string
    links = 'https://coinpaprika.com' + coin_name.get('href')
    print(names)
    print(links)

程序打印:

None
https://coinpaprika.com/coin/btc-bitcoin/
Bitcoin
https://coinpaprika.com/coin/btc-bitcoin/
None
https://coinpaprika.com/coin/xrp-xrp/
XRP
https://coinpaprika.com/coin/xrp-xrp/
None
https://coinpaprika.com/coin/eth-ethereum/
Ethereum
https://coinpaprika.com/coin/eth-ethereum/

代替:

Bitcoin
https://coinpaprika.com/coin/btc-bitcoin/
XRP
https://coinpaprika.com/coin/xrp-xrp/
Ethereum
https://coinpaprika.com/coin/eth-ethereum/

我了解原因是:

<td class="table__fixed-cell">
                    <a href="/coin/btc-bitcoin/"><span class="coin-icon currency_images-0"></span></a>
                </td>


<td class="table__fixed-cell">
                    <a href="/coin/btc-bitcoin/">Bitcoin</a>
                    <small>BTC</small>
                </td>

但是我仍然不知道如何只打印第二个。 有人可以帮我吗?

2 个答案:

答案 0 :(得分:1)

某些链接的锚文本为空,因为它用于图标图像

<a href="/coin/btc-bitcoin/"><span class="coin-icon currency_images-0"></span></a>

添加支票

for coin_name in coin_list_items:
    names = coin_name.string
    if not names:
      continue
    links = 'https://coinpaprika.com' + coin_name.get('href')
    print(names)
    print(links)

答案 1 :(得分:1)

只需找到包含文本的标签即可。

coin_list_items = coin_list.find_all('a',text=True)