Android从网站提取多个表

时间:2018-09-19 21:56:46

标签: android html web-scraping html-table

卡住了几周后,我能够自动登录到网站并能够下载Excel文件,还可以查看网站正文。

我还有另一个问题,希望您能提供帮助。如何提取每个表?每个表的数据将插入到sqllite数据库中。以下是网站表格的示例:

<tr class="odd">
                <td colspan="10" style="text-align:center;font- 
 size:12px;font-weight:600;">
                    122 Address
                </td>
            </tr>

        <tr class="odd">
            <td>122Address</td>
            <td>Guest Name/td>
            <td>Aug 06 -- Sep 07</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
09-19 17:11:36.396 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:           <td>No</td>
            <td>Agent Name</td>
        </tr>

            <tr class="odd">
                <td>&nbsp;</td>
                <td colspan="9">Remarks</td>
            </tr>



            <tr class="even">
                <td colspan="10" style="text-align:center;font-size:12px;font-weight:600;">
                    154 Address
                </td>
            </tr>

        <tr class="even">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Aug 30 -- Sep 02</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
09-19 17:11:36.397 14017-14017/com.pmapp.mikeys.propertymanagementapp I/System.out:             <td>No</td>
            <td>Agent Name</td>
        </tr>


        <tr class="odd">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 07 -- Sep 09</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>


        <tr class="even">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 14 -- Sep 16</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
09-19 17:11:36.398 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:           <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>


        <tr class="odd">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 16 -- Sep 19</td>
            <td>No</td>
            <td>No</td>
            <td><div style="color:red;font-weight:600;">PH</div></td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
09-19 17:11:36.399 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:           <td>Agent Name/td>
        </tr>

            <tr class="odd">
                <td>&nbsp;</td>
                <td colspan="9">Remarks</td>
            </tr>


        <tr class="even">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 20 -- Sep 23</td>
            <td>No</td>
            <td>No</td>
            <td><div style="color:red;font-weight:600;">PH</div></td>
            <td>No</td>
09-19 17:11:36.400 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:           <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>


        <tr class="odd">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 28 -- Sep 30</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>

09-19 17:11:36.401 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:       
        <tr class="even">
            <td>154Address</td>
            <td>Guest Name</td>
            <td>Sep 30 -- Oct 06</td>
            <td>No</td>
            <td>No</td>
            <td><div style="color:red;font-weight:600;">PH</div></td>
            <td><div style="color:red;font-weight:600;">GR</div></td>
            <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>



09-19 17:11:36.402 14017-14017/com.pmapp.mikeys.propertymanagementapp 
I/System.out:           <tr class="odd">
                <td colspan="10" style="text-align:center;font- 
size:12px;font-weight:600;">
                    165 Street address
                </td>
            </tr>

        <tr class="odd">
            <td>165Address</td>
            <td>Guest Name</td>
            <td>Sep 01 -- Sep 03</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>No</td>
            <td>Agent Name</td>
        </tr>

希望您能提供帮助。因此,我将为每个项目创建一个变量,例如:

Address,
Guest Name,
Check In, 
Check Out,
Early Arrival,
Late Departure,
Pool Heat,
Grill,
Crib,
High Chair,
Agent,
Remarks (if any)

1 个答案:

答案 0 :(得分:0)

使用JSoup并按照其文档从DOM中获取表。在您的依赖项中添加compile 'org.jsoup:jsoup:1.11.3'