选择具有类名称的特定行

时间:2018-09-25 01:48:24

标签: python-3.x beautifulsoup

我正在解析一个HTML,它包含一堆我要选择的行。这是这些行的示例

<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">

我想做的是使用BeautifulSoup4find_all,使用正则表达式find_all(re.compile(regext))

但是,问题是我无法提出一个好的正则表达式来选择我感兴趣的所有行。

我要以constantstring-开头的所有行。我不在乎它是什么。正确的方法是什么,我应该使用re.compile,如果这样,正确的regex是什么?

2 个答案:

答案 0 :(得分:1)

如果您想使用RE完成此操作,请执行以下操作,我添加了一个额外的行来演示它而不占用最后一行。

http://rextester.com/OSSFB8621

from bs4 import BeautifulSoup
import re
html ="""
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue1-row" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue1-row'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="constantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
<tr class="axcconstantstring-randomvalue2-row-2" onmouseover="this.className='constantstring-light-row-cp-h'" onmouseout="this.className='constantstring-randomvalue2-row-2'" onclick="if(ignoreOnClick==false)window.location='find.ashx?cv3dsw'" valign="top">
"""
bs = BeautifulSoup(html,'lxml')
for tr in bs.find_all("tr", {"class" : re.compile('^(constantstring)')}):
    print(tr)

答案 1 :(得分:0)

您可以使用内置字符串方法代替正则表达式来执行同一任务。喜欢,

rows = soup.find_all('tr)'
selected_rows = [i for i in rows if str(i).startswith('tr class="constantstring-randomvalue')]

如果您错过str(),则if条件将失败。

希望这会有所帮助!干杯!