如何从汤中提取div的跨度信息

时间:2019-03-09 14:46:24

标签: python html beautifulsoup

我下面有一段HTML代码:

    <div class="user-tagline ">
      <span class="username " data-avatar="aaaaaaa">player1</span>
      <span class="user-rating">(1357)</span>
      <span class="country-flag-small flag-113" tip="Portugal"></span>
    </div>
    <div class="user-tagline ">
      <span class="username " data-avatar="bbbbbbb">player2</span>
      <span class="user-rating">(1387)</span>
      <span class="country-flag-small flag-70" tip="Indonesia"></span>
    </div>

我要从中提取“葡萄牙”,请注意span类是动态类,它并不总是class="country-flag-small flag-113",但实际上是根据为此div块生成的国家/地区值进行的更改。

要获取player11357,我使用了以下繁琐的代码:

player1info = soup.findAll('div', attrs={'class':'user-tagline'})[0].text.split("\n")
player1 = player1info[1]
pscore1 = player1info[1].replace('(','').replace(')', '')

如果有人可以在这里与您共享更好的解决方案,我们将不胜感激。预先谢谢你

更新:

在提取了最初的HTML div信息之后,现在我想扩展它以提取整行的更多信息,这是该行:

<tr board-popover="" fen="r1bk2r1/1p2n3/pN6/1B1qQp2/P2Pp2p/1P6/2P2PPP/R3K1R1 b Q -" flip-board="1" highlight-squares="c4b6">
         <td>
          <a class="clickable-link td-user" href="https://www.chess.com/live/game/2249663029?username=belemnarmada" target="_self">
           <span class="time-control">
            <i class="icon-rapid">
            </i>
           </span>
           <div class="user-tagline ">
            <span class="username " data-avatar="https://betacssjs.chesscomfiles.com/bundles/web/images/noavatar_l.1c5172d5.gif" data-country="Portugal" data-enabled="true" data-flag="113" data-joined="Joined Jun 19, 2016" data-logged="Online 6 hrs ago" data-membership="basic" data-name="Atikinounette" data-popup="hover" data-title="" data-username="Atikinounette">
             Atikinounette
            </span>
            <span class="user-rating">
             (1357)
            </span>
            <span class="country-flag-small flag-113" tip="Portugal">
            </span>
           </div>
           <div class="user-tagline ">
            <span class="username " data-avatar="https://images.chesscomfiles.com/uploads/v1/user/28196414.83e31ff1.50x50o.3a6f77e4aa44.jpeg" data-country="Indonesia" data-enabled="true" data-flag="70" data-joined="Joined May 15, 2016" data-logged="Online Nov 7, 2017" data-membership="basic" data-name="belemnarmada" data-popup="hover" data-title="" data-username="belemnarmada">
             belemnarmada
            </span>
            <span class="user-rating">
             (1387)
            </span>
            <span class="country-flag-small flag-70" tip="Indonesia">
            </span>
           </div>
          </a>
         </td>
         <td>
          <a class="clickable-link text-middle" href="https://www.chess.com/live/game/2249663029?username=belemnarmada" target="_self">
           <div class="pull-left">
            <span class="game-result">
             1
            </span>
            <span class="game-result">
             0
            </span>
           </div>
           <div class="result">
            <i class="icon-square-minus loss" tip="Lost">
            </i>
           </div>
          </a>
         </td>
         <td class="text-center">
          <a class="clickable-link" href="https://www.chess.com/live/game/2249663029?username=belemnarmada" target="_self">
           30 min
          </a>
         </td>
         <td class="text-right">
          <a class="clickable-link text-middle moves" href="https://www.chess.com/live/game/2249663029?username=belemnarmada" target="_self">
           25
          </a>
         </td>
         <td class="text-right miniboard">
          <a class="clickable-link archive-date" href="https://www.chess.com/live/game/2249663029?username=belemnarmada" target="_self">
           Aug 9, 2017
          </a>
         </td>
         <td class="text-center miniboard">
          <input class="checkbox" game-checkbox="" game-id="2249663029" game-is-live="true" ng-model="model.gameIds[2249663029].checked" type="checkbox"/>
         </td>
        </tr>

所需信息为:

player's info (answer provided by @balderman already got that)
game-result (1, 0)
playing time (30 min in this row)
total moves (25)
playing date (Aug 9, 2017)

非常感谢您。

3 个答案:

答案 0 :(得分:0)

下面的代码怎么样?

用户属性在div下为3个范围​​的想法。因此,代码指向这些范围并提取数据。

from bs4 import BeautifulSoup

html = '''<html><body>   <div class="user-tagline ">
      <span class="username " data-avatar="aaaaaaa">player1</span>
      <span class="user-rating">(1357)</span>
      <span class="country-flag-small flag-113" tip="Portugal"></span>
    </div>
    <div class="user-tagline ">
      <span class="username " data-avatar="bbbbbbb">player2</span>
      <span class="user-rating">(1387)</span>
      <span class="country-flag-small flag-70" tip="Indonesia"></span>
    </div><body></html>'''

soup = BeautifulSoup(html, 'html.parser')

users = soup.findAll('div', attrs={'class': 'user-tagline'})
for user in users:
    user_properties = user.findAll('span')
    for idx, prop in enumerate(user):
        if idx == 1:
            print('user name: {}'.format(prop.text))
        elif idx == 3:
            print('user rating: {}'.format(prop.text))
        elif idx == 5:
            print('user country: {}'.format(prop.attrs['tip']))

输出

user name: player1
user rating: (1357)
user country: Portugal
user name: player2
user rating: (1387)
user country: Indonesia

答案 1 :(得分:0)

这是一个更具可读性的解决方案:

div1 = soup.select("div.user-tagline")[0]
player1 = div1.select_one("span.user-rating").text
pscore1 = div1.select_one("span.country-flag-small").text

要提取所有div的数据,只需使用循环即可。并将“ 0”替换为“ i”。

答案 2 :(得分:0)

如果您只对第一格感兴趣,可以这样做:

res = bsobj.find('div', {'class':'user-tagline'}).findAll('span')
print(res[0].text, res[1].text, res[2]['tip'])