如何使用Jsoup [Android]从网页检索特定的表

时间:2016-09-20 05:13:09

标签: java android jsoup

我正在尝试从this网址检索一个表格。这是我需要检索的表格:

 <table id="h2hSum" class="competitionRanking tablesorter"> 
              <thead> 
               <tr> 
                <th align="center">Team</th> 
                <th align="center">Played</th> 
                <th align="center">Win</th> 
                <th align="center">Draw</th> 
                <th align="center">Lose</th> 
                <th align="center">Score</th> 
                <th>Goals Scored</th> 
                <th>Goals Allowed</th> 
               </tr> 
              </thead> 
              <tbody> 
               <tr> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/676_Manchester_City_FC">Manchester City</a></td> 
                <td>140</td> 
                <td>47</td> 
                <td>38</td> 
                <td>55</td> 
                <td>188:205</td> 
                <td>1.34</td> 
                <td>1.46</td> 
               </tr> 
               <tr class="odd"> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/661_Chelsea_FC">Chelsea</a></td> 
                <td>140</td> 
                <td>55</td> 
                <td>38</td> 
                <td>47</td> 
                <td>205:188</td> 
                <td>1.46</td> 
                <td>1.34</td> 
               </tr> 
              </tbody> 
             </table>

这就是我的尝试:

private class SimpleTask1 extends AsyncTask<String, String, String>
{
    ProgressDialog loader;


    @Override
    protected void onPreExecute()
    {
        loader = new ProgressDialog(MainActivity.this, ProgressDialog.STYLE_SPINNER);
        loader.setMessage("loading engine");
        loader.show();

    }

    protected String doInBackground(String... urls)
    {
        String result1 = "";
        try {

            Document doc = Jsoup.connect(urls[0]).get();
            Element table = doc.select("table[class=competitionRanking tablesorter]").first();
            Iterator<Element> ite = table.select("td").iterator();

            ite.next();
            Log.w("Value 1: ",""+ ite.next().text());
            Log.w("Value 2: ",""+ ite.next().text());
            Log.w("Value 3: ",""+ ite.next().text());
            Log.w("Value 4: ",""+ ite.next().text());

        } catch (IOException e) {

        }
        return result1;
    }

    protected void onPostExecute(String sampleVal)
    {
        loader.dismiss();
        Log.e("OUTPUT",""+sampleVal);



    }




}

然而,这会抛出异常,我尝试了类似的答案,但答案不同,因为使用类名或td宽度访问表。 我该怎么办才能访问此表中的所有值?请帮助。

2 个答案:

答案 0 :(得分:1)

<强>问题

Iterator<Element> ite = table.select("td").iterator();会抛出NullPointerException

<强>原因

首次访问该网站后,如果您的活动与机器人类似,他们似乎会存储您的IP并在第二次访问时请求注册。您被重定向到的着陆页不包含该表,因此tablenull,您无法在select(...)上致电null

<强>解决方案

注册服务并将登录过程插入您的代码,或者如果您被重定向到注册页面,请使用代理切换IP地址。 不确定ip被阻止了多长时间,但使用vpn和以下代码我连续20次查询都没有问题。因此,请务必设置原始网站请求中包含的用户代理,Cookie和其他标头字段(例如,使用浏览器中的开发人员工具/网络工具进行监控):

<强>代码

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
Response res = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .followRedirects(true).userAgent(userAgent).referrer("http://www.soccerpunter.com")
        .method(Method.GET).header("Host", "http://www.soccerpunter.com").execute();

Document doc = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/head_to_head_statistics/all/676_Manchester_City_FC/661_Chelsea_FC")
        .userAgent(userAgent).timeout(10000).header("Host", "http://www.soccerpunter.com")
        .cookies(res.cookies())
        .referrer("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .get();

Elements td = doc.select("table.competitionRanking.tablesorter").first().select("td");

答案 1 :(得分:0)

试试这个:

DECLARE @command varchar(1000)
SELECT @command = 'use [?] select ''[?]'', db_id(parsename(base_object_name, 3)) as dbid
     , object_id(base_object_name) as objid
     , base_object_name
from sys.synonyms;'
EXEC sp_MSforeachdb @command