如何使用jsoup android获取特定标签

时间:2017-01-02 17:23:25

标签: android html parsing jsoup

这是我HTML的一部分

<p>hello world </p>
<p><img class=\"aligncenter size-full wp-image-3197\" src=\"\" data-lazy-src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" data-lazy-srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /><noscript><img class=\"aligncenter size-full wp-image-3197\" src=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg\" alt=\"harmony-02\" width=\"800\" height=\"450\" srcset=\"http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02.jpg 800w, http://memaraneha.ir/wp-content/uploads/2016/12/harmony-02-300x169.jpg 300w\" sizes=\"(max-width: 800px) 100vw, 800px\" /></noscript></p
<p>goodbye world</p>

如您所见,HTML中有3个<p>标记。但是我怎样才能在jsoup中定义正常的<p>标签,如hello world和goodbye world,并忽略{im} class <p>标签?

到目前为止,这是我的代码:

public class MainActivity extends AppCompatActivity {

   public WebView webView;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main_page);
        webView=(WebView)findViewById(R.id.webi);


        new AsyncTask<Void, Void, String>() {
            @Override
            protected String doInBackground(Void... voids) {
                String html = "";
                try {
                    Document document = Jsoup.connect("http://memaraneha.ir/%db%8c%da%a9%d9%be%d8%a7%d8%b1%da%86%da%af%db%8c-%d9%87%d9%85%d8%a7%d9%87%d9%86%da%af%db%8c-%d8%b7%d8%b1%d8%a7%d8%ad%db%8c-%d8%af%d8%a7%d8%ae%d9%84%db%8c/")
                            .timeout(20000).get();

                    Elements elements=document.select("div.base-box:nth-child(2)").select("p");
                    html = elements.toString();

                } catch (IOException e) {
                    e.printStackTrace();
                }
                return html;
            }
            @Override
            protected void onPostExecute(String html) {

                String mime = "text/html";
                String encoding = "utf-8";

                webView.loadDataWithBaseURL(null,html, mime, encoding,null);
            }
        }.execute();

    }

}

2 个答案:

答案 0 :(得分:1)

您可以避免循环并使用以下内容:

Elements e = doc.select("p:not(:has(img))");

答案 1 :(得分:0)

你可以尝试这样的事情。

选择所有不在<p>内嵌<img>标记的 Document document = Jsoup.connect().get(); Elements elements = new Elements(); for (Element e : document.select("p")) { if (e.select("img").isEmpty()) { elements.add(e); } } 代码

case 'hello':
       sendHelloGenericReponse(senderID);
       sendWeatherQuickReplyQuestion(senderID); //only execute this after above is complete
       break;