BeautifulSoup:从元素中提取属性?

时间:2018-06-02 08:21:36

标签: python beautifulsoup

我试图在Stackoverflow上查看这些内容,但我无法使其适合我的代码。也许有人可以帮我这个?

我正在尝试让团队1',' team2'和来自此HTML的< bettext' -attributes:



<table class="sportbet_extra_list_table" id="mc-ga312004790">
    <tbody>
        <tr>
            <td class="sportbet_extra_c0"></td>
            <td class="sportbet_extra_c1"><span>
                <a class="combi_1"></a>
                Hvem vinder kampen?                            </span></td>
            <td class="sportbet_extra_c2">
			                <div id="mc-ti312004790_1" class="js-ti312004790_1 sportbet_extra_rate_content" onclick="Bettingslip.addBet({type: 'N', team1: 'Rusland', team2: 'Saudi Arabien', bettext: 'Hvem vinder kampen?', combi_cat: 1, sub_group: 0, game: 312004790, groupId:461392, leagueId:30124, odd: 138, odd_id: 312004790, tiptext: '1', tip: 1, betstyle: 2224})">
                    <div class="sportbet_content_rate_left">1</div>
                    <div class="sportbet_content_rate_right">1,38</div>
                </div>
				
            </td>
&#13;
&#13;
&#13;

到目前为止,这段代码是我用来从sportbet_extra_list_table中提取信息的代码:

    REQUEST = requests.get('https://www.cashpoint.dk/en/? 
              r=bets/xtra&group=461392&game=312004790').text
    SOUP = BeautifulSoup(REQUEST, 'lxml')
    # find_all to extract all
    SCRAPE = SOUP.find('table', class_='sportbet_extra_list_table')

    for CLEAN in SCRAPE:
        CLEANER = BeautifulSoup(str(CLEAN), 'lxml').text
        STRIP = " ".join(line.strip() for line in CLEANER.split("\n"))
        print(STRIP)

我试图添加

SOUP.find('table', class_='sportbet_extra_list_table', attrs={"onclick": "team1"})

但它没有工作

3 个答案:

答案 0 :(得分:0)

请尝试以下操作,以便按照帖子中提到的方式获取输出:

p.area()

部分结果:

import json
import requests 
from bs4 import BeautifulSoup

url = "https://www.cashpoint.dk/en/?r=bets/xtra&group=461392&game=312004790"

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')

dataset = []
for items in soup.select("#container_xtra [id^='mc-ti']"):
    d = {}
    data = items.get("onclick").split("Bettingslip.addBet(")[1].split(")")[0]

    d['team1'] = data.split("team1:")[1].split(",")[0].split("'")[1].split("'")[0]
    d['team2'] = data.split("team2:")[1].split(",")[0].split("'")[1].split("'")[0]
    d['bettext'] = data.split("bettext:")[1].split(",")[0].split("'")[1].split("'")[0]
    if d not in dataset:
        dataset.append(d)

print(json.dumps(dataset,indent=4))

答案 1 :(得分:0)

您可以使用demjson.decode()将原始JavaScript对象转换为Python词典。这使得获取有关投注的具体数据变得更加容易。

<强>代码:

import re
import demjson
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.cashpoint.dk/en/'
                 '?r=bets/xtra'
                 '&group=461392'
                 '&game=312004790')

soup = BeautifulSoup(r.text, 'lxml')
tables = soup.select('table.sportbet_extra_list_table')

for table in tables:
    fields = table.select('.sportbet_extra_rate_content')
    for field in fields:
        js_obj = re.search('{.+}', field['onclick']).group()
        bet = demjson.decode(js_obj)
        print((bet['team1'], bet['team2'], bet['bettext'], bet['tiptext'], bet['tip']))

<强>输出:

('Rusland', 'Saudi Arabien', 'Hvem vinder kampen?', '1', 1)
('Rusland', 'Saudi Arabien', 'Hvem vinder kampen?', 'X', 3)
('Rusland', 'Saudi Arabien', 'Hvem vinder kampen?', '2', 2)
('Rusland', 'Saudi Arabien', 'Dobbeltchance', '1x', 1)
...
('Rusland', 'Saudi Arabien', 'Scorer i begge HL', 'B', 2)

答案 2 :(得分:-1)

以下是您问题的解决方案:

SCRAPE = SOUP.find('table', class_='sportbet_extra_list_table')
# Get the content of the onclick attribute using ['onclick']
SCRAPE = SCRAPE.find('div', id="mc-ti312004790_1")['onclick']
# Now separate every variable in it
attrs = SCRAPE.split(',')
# Retrieve what you want
team1 = attrs[1].split(':')[1].replace(' ', '').replace('\'', '')
team2 = attrs[2].split(':')[1].replace(' ', '').replace('\'', '')
bettext = attrs[3].split(':')[1].replace(' ', '').replace('\'', '')

print(team1)
print(team2)
print(bettext)

输出:

  

RUSLAND

     

SaudiArabien

     

Hvemvinderkampen?

attrs变量如下所示:

  

[&#34; Bettingslip.addBet({type:&#39; N&#39;&#34;,&#34; team1:&#39; Rusland&#39;&#34;,&#34; ; team2:&#39;沙特阿拉伯&#39;&#34;&#34; bettext:&#39; Hvem vinder kampen?&#39;&#34;,&#39; combi_cat:1&#39; ,&#39; sub_group:0&#39;,   &#39;游戏:312004790&#39;,&#39; groupId:461392&#39;,&#39; leagueId:30124&#39;,&#39;奇数:138&#39;,&#39; odd_id:312004790&#39;,&#34; tiptext:&#39; 1&#39;&#34;,&#39;提示:1&#39;,&#39; betstyle:2224})&#39;]

attrs [1]变量是:

  

&#34; team1:&#39; Rusland&#39;&#34;

做.split(&#39;:&#39;)给出:

  

[&#39;&#34; team1&#34;,&#34; &#39; RUSLAND&#39;&#34;]

获取team1名称我们将采用attrs [1] .split(&#39;:&#39;)[1],它给出了:

  

&#34; &#39; RUSLAND&#39;&#34;

执行.replace(&#39;&#39;,&#39;&#39;)删除空格,并执行.replace(&#39; \&#39;&#39;,&# 39;&#39;)删除&#34;&#39;&#34;。