如何用硒循环单击并用bs4刮擦每个表?

时间:2019-01-19 15:05:26

标签: python loops selenium beautifulsoup

我正在尝试抓取一些隐藏的表(每页15个表),这些隐藏的表在单击箭头后会展开。 (我要附上图片:Unexpanded tables Expanded tables

我也附上了HTML(抱歉,它有点长)

<table class="footable table toggle-arrow-tiny default breakpoint footable-loaded" transparenturl="Images/arrow_none.gif" ascendingurl="Images/arrow_up.gif" customsortdirection="Ascending" custompageindex="0" customsortfield="fullname" custompagealphaindex="A" custompagemode="ABC" custompagealpharelative="A" descendingurl="Images/arrow_down.gif" customvirtualcount="1605" id="MainContent_gw_partners" style="border-collapse:collapse;" cellspacing="0">
    <thead>
        <tr>
            <th data-toggle="true" scope="col" class="footable-visible footable-first-column"> &nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible"> &nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">Titolo&nbsp;&nbsp;</th><th scope="col" class="footable-visible">Cognome&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">NPA&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible">Luogo&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible footable-last-column">Cantone&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s)&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Società&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Cognome&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">C/O&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Via&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">NPA&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Luogo&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Tel / Cellulare&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Cellulare  &nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Fax&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">e-mail&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Sito WEB&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Altri luoghi di lavoro&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s)&nbsp;&nbsp;</th>
        </tr>
    </thead><tbody>
        <tr class="row_white footable-detail-show">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

                    </td><td class="footable-visible">&nbsp;</td><td class="footable-visible">

                        ABBONDANZIERI Katia
                    </td><td class="footable-visible">
                        1204
                        <br>

                    </td><td class="footable-visible">
                        Genève
                        <br>

                    </td><td class="footable-visible footable-last-column">
                        GE
                        <br>

                    </td><td style="display: none;">
                        197.&nbsp;Omeopatia, 202.&nbsp;Linfodrenaggio&nbsp;manuale, 205.&nbsp;Massaggio&nbsp;classico, 664.&nbsp;Riflessoterapia&nbsp;generale
                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        ABBONDANZIERI Katia
                    </td><td style="display: none;">


                    </td><td style="display: none;">
                        Place du Cirque, 2
                    </td><td style="display: none;">
                        1204
                    </td><td style="display: none;">
                        Genève
                    </td><td style="display: none;">
                        022 328 23 44 
                    </td><td style="display: none;">
                        079 601 92 75 
                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        <div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div>
                    </td>
        </tr><tr class="footable-row-detail" style="display: table-row;"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">197.&nbsp;Omeopatia, 202.&nbsp;Linfodrenaggio&nbsp;manuale, 205.&nbsp;Massaggio&nbsp;classico, 664.&nbsp;Riflessoterapia&nbsp;generale</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABBONDANZIERI Katia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Place du Cirque, 2</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1204</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Genève</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Tel / Cellulare:</div><div class="footable-row-detail-value">022 328 23 44</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">079 601 92 75</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div></div></div></div></td></tr><tr class="row_grey footable-detail-show">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

                            <a href="http://www.kinesiopourtous.ch" target="_blank">
                                <img title="Link internet" alt="" style="MARGIN-RIGHT: 7px" src="Images/pictoSiteInternet.jpg" width="12" height="12" border="0">
                            </a>

                    </td><td class="footable-visible">&nbsp;</td><td class="footable-visible">
                        <img id="MainContent_gw_partners_img1_1" src="Images/multi.gif">
                        ABEGG Sophie
                    </td><td class="footable-visible">
                        1212
                        <br>
                        1875<br>
                    </td><td class="footable-visible">
                        Grand-Lancy
                        <br>
                        <nobr>Morgins</nobr><nobr><br>
                    </nobr></td><td class="footable-visible footable-last-column">
                        GE
                        <br>
                        VS<br>
                    </td><td style="display: none;">
                        199.&nbsp;Kinesiologia
                    </td><td style="display: none;">
                        Kinéso pour tous
                    </td><td style="display: none;">
                        ABEGG Sophie
                    </td><td style="display: none;">


                    </td><td style="display: none;">
                        Rue du Bachet 8
                    </td><td style="display: none;">
                        1212
                    </td><td style="display: none;">
                        Grand-Lancy
                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        076 365 63 86
                    </td><td style="display: none;">

                    </td><td style="display: none;">

                            <a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
                            </a>

                    </td><td style="display: none;">

                            <a href="http://www.kinesiopourtous.ch" target="_blank">
                                www.kinesiopourtous.ch
                            </a>

                    </td><td style="display: none;">
                        Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br>
                    </td><td style="display: none;">
                        <div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div>
                    </td>
        </tr><tr class="footable-row-detail"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">199.&nbsp;Kinesiologia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Società:</div><div class="footable-row-detail-value">Kinéso pour tous</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABEGG Sophie</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Rue du Bachet 8</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1212</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Grand-Lancy</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">076 365 63 86</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">e-mail:</div><div class="footable-row-detail-value"><a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
                            </a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Sito WEB:</div><div class="footable-row-detail-value"><a href="http://www.kinesiopourtous.ch" target="_blank">
                                www.kinesiopourtous.ch
                            </a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Altri luoghi di lavoro:</div><div class="footable-row-detail-value">Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div></div></div></div></td></tr><tr class="row_white">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

因此,我正在使用Selenium单击,BeautifulSoup 4用于刮擦表格。

我想创建一个循环来单击每个箭头(每页15个箭头)并从每个表中抓取数据(每个表中13行。如果缺少数据,则该单元格应在输出的excel文件中为空白)。

有什么帮助吗?

3 个答案:

答案 0 :(得分:0)

如果您进行检查,则可以看到它的“请求方法:POST”,因此使用了另一种方法。

如果您仍然希望使用硒,请告诉我,我也可以尝试这种方法。

您将需要获取表格数据,并将其复制到有效负载字典中。我没有包含整个内容,因为它太长了,但是我在代码中包含了一个小片段,以便您可以看到格式。

enter image description here

然后我只是用熊猫来获取数据表。

Range.Value

输出:

Sub Test()
    Dim ws As Worksheet
    Dim c As Range

    Set ws = ActiveSheet
    If WorksheetFunction.CountA(ws.Columns(2)) > 0 Then
        Set c = ws.Columns(2).Find( _
            What:="Total WI Expenses", _
            After:=ws.Cells(1, 2), _
            SearchOrder:=xlByRows, _
            SearchDirection:=xlNext)
        If Not c Is Nothing Then
            ws.Rows(c.Row + 4).Value = ws.Rows(c.Row).Value
        End If
        Set c = Nothing
    End If
    Set ws = Nothing
End Sub

答案 1 :(得分:0)

硒扩展那些表的方法。有一种更好的方法来处理需要加载的领带,但只是想尽快将其拿给您,因此只需使用time.sleep

from selenium import webdriver
import time


url = 'http://www.asca.ch/Partners.aspx?lang=it'

driver = webdriver.Chrome()
driver.get(url)

# Click the dropdown, select GE, click Confermo, click Ricerca
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_Arrow"]').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_DropDown"]/div/ul/li[9]').click()
driver.find_element_by_xpath('//*[@id="MainContent__chkDisclaimer"]').click()
driver.find_element_by_xpath('//*[@id="MainContent_btn_submit"]').click()
time.sleep(5)

#Function to Expand Tables
def expand_tables():
    rows = driver.find_elements_by_xpath('//*[@id="MainContent_gw_partners"]/tbody/tr')
    for row in rows:
        row.click()

# Function to Click Next Page        
def click_next_page():
    driver.find_element_by_xpath('//*[@id="MainContent_btnNextPackId"]').click()



page = 1
num_of_pages = True
while num_of_pages == True:
    print ('Page: %s' %page)
    expand_tables()

    ## Your code to Parse the Tables ## 

    try:
        click_next_page()
        page += 1
    except:
        print ('You are at the end')


    time.sleep(2)






# When finished
driver.close()

答案 2 :(得分:0)

对不起,我的代码无法放入注释中,因此我将其作为答案发布。

这是我解析表的代码:

# To find all the tables
table = soup.find('table', {'class': 'footable'})

# To get all rows in that table
rows = table.find_all('tr')

# A function to process each row
def processRow(row):
    #All rows with hidden data
    dataFields = row.find_all('td', {'style': True}
    output = {}
    #Fixed index numbers are not ideal but in this case will work
    output['Discipline'] = dataFields[0].text
    output['Cogome'] = dataFields[2].text
    output['Cellulare'] = dataFields[8].text
    output['email'] = dataFields[10].text
    return output

# Declaring a list to store all results
results = []

# Iterating over all the rows and storing the processed result in a list
for row in rows:
    results.append(processRow(row))

print(results)


    click_next_page()
    time.sleep(3)
    count += 1

我认为有些问题。我在下面的“输出= {}”处得到“语法错误:语法无效”#一个用于处理每一行的函数。