在网站的指定部分中查找标签

时间:2018-08-23 14:57:54

标签: python regex beautifulsoup

我想从网页中提取所有(在本例中为两个)hast标签。

 $(document).ready(function () {
      $('#addEOYPayment').click(function () {
            $.ajax({
                type: "GET",

                url: "AmountOwed",

                datatype: "Json",


                success: function (data) {
                    $('#TblEOYPayment_AmountOwed').html(data.responseText);
                }

            });
        });
    });

但是我只对一个分支(在此示例中为包装器)中的哈希标签感兴趣:“#hash1 with space”和“#hash2withoutsace”。现在,我的代码如下:

<html>
    <head>
    </head>
    <body>
        <div class="predefinition">
            <p class="part1">
              <span class="part1-head">Entries:</span>
                <a class="pr" href="/go_somewhere/">#hashA with space</a>, 
                <a class="pr" href="/go_somewhere/">#hashBwithoutsace</a>,
            </p>
            <span class="part2">Boundaries:</span>
            <p>some boundary statement</p>
        </div>        
        <div class="wrapper"> <!– I only want to search here–>
            <p class="part1">
              <span class="part1-head">Entries:</span>
                <a class="pr" href="/go_somewhere/">#hash1 with space</a>, <!– I only want to find this–>
                <a class="pr" href="/go_somewhere/">#hash2withoutsace</a>, <!– and this–>
            </p>
            <span class="part2">Boundaries:</span>
            <p>some other boundary statement</p>
        </div>        
    </body>
</html>
  • 如何将搜索重点放在“包装器” div上?
  • 以及如何在井号中包含空格?

1 个答案:

答案 0 :(得分:1)

您可以使用a class查找所有pr标签的文本,然后选择最后两个:

from bs4 import BeautifulSoup as soup
results = [i.text for i in soup(content, 'html.parser').find('div', {'class':'wrapper'}).find_all('a', {'class':'pr'})]

输出:

['#hash1 with space', '#hash2withoutsace']