使用jsoup解析html

时间:2017-03-17 07:51:53

标签: java html jsoup

我有一个使用Jsoup解析的HTML代码:



<!DOCTYPE html>
<html>

<head>

  <body class="show">
    <div class="page show">
      <div class="content-wrapper posting-page">
        <div class="content">
          <div class="section-wrapper accent-section page-full-width">
            <div class="section section page-centered posting-header">
              <div class="posting-headline">
                <h2>Algorithms Engineer</h2>
                <div class="posting-categories">
                </div>
                <div class="postings-btn-wrapper">
                </div>
              </div>
              <div class="section-wrapper page-full-width">
                <div class="section section page-centered">
                  <p></p>
                  <div>
                    <span style="font-size: 14.6667px">Do you eat combinatorial optimization for breakfast? Yum!</span>
                  </div>
                  <div>
                    <br>
                  </div>
                  <div>
                    <span style="font-size: 14.6667px">As an on-demand valet service, Luxe is in the business of deciding who will do what, when, and where. Our logistics engine is built on clever algorithms spanning Optimization, Graph Theory, Markov Chains, and Machine Learning.</span>
                  </div>
                  <div>
                    <br>
                  </div>
                  <div>
                    <p></p>
                  </div>
                  <div class="section section page-centered">
                    <div>
                      <h3>Qualifications:</h3>
                      <ul class="posting-requirements plain-list">
                        <ul>
                          <li>PhD in Computer Science, Operations Research, Applied Math, or equivalent</li>
                          <li>3+ years in a fast-moving, product-driven company</li>
                          <li>History of accomplishment and achievement in the field</li>
                          <li>Advanced facility with Python, plus whatever languages best run your algorithms</li>
                          <li>Respect for “good enough” over “perfect”</li>
                          <li>Healthy detachment from your code</li>
                        </ul>
                      </ul>
                    </div>
                  </div>
                  <div class="section page-centered last-section-apply">
                    <a class="postings-btn template-btn-submit" href="https://jobs.lever.co/luxe/f0418d22-2c3f-4c2d-a53f-dbbb8baff424/apply">Apply for this job</a>
                  </div>
                </div>
              </div>
            </div>
            <div class="main-footer page-full-width">
              <script async="" src="//www.google-analytics.com/analytics.js">
                < script data - releasestage = "production"
                data - endpoint = "https://bugs.lever.co/js"
                data - appversion = "0.0.1489100506"
                data - apikey = "6a247c6ff13012d02fde17377f0b857b"
                src = "/js/bug-snag.js" >
                  <
                  script >
                  <
                  script >
                  <
                  /body> <
                  /html>
&#13;
&#13;
&#13;

我希望获得课程内的所有文字&#34;部分以页面为中心&#34;无论是<span>消息还是<ul>。 我需要获得全文,任何人都可以帮助我这样做。 提前致谢。

1 个答案:

答案 0 :(得分:0)

试试这个

Document doc2 = Jsoup.connect("your web page").get();
Elements ele = doc.getElementsByClass("section.section.page-centered").not("section.section.page-centered.posting-header");
String text = ele.text();
//ele.text() gives the only the text