使用Jsoup库拆分值

时间:2014-01-28 05:46:35

标签: java java-ee jsoup

我需要从document中分割值。请帮助我如何使用Jsoup库。

<div class="text">
  Flamingnet Student Book Reviewer  KThom
<br/>
     It's been eight years since the Assembly (an alien race) took over Earth and captured all the adult population. Eight years that Holt Hawkins has spent as a bounty hunter in a world ruled by rebel youths.  Holt is transporting his latest prisoner, Mira, to the Midnight City to collect his reward when the two come across a crashed Assembly ship with a young girl named Zoey trapped inside.  Together, they rescue Zoey and soon discover her magical abilities that could stop the Assembly for good.  The three embark on a treacherous journey across the barren wasteland they once called home, fighting for their own lives as well as each others'.
<p>Midnight City is an amazing book.  In the beginning, you don't really know how Earth was captured, but you know enough to be able to read and enjoy the book and learn more as the book goes on.  The author reveals the right amount of information throughout the book, otherwise there would be a whole history section that wasn't needed.  The book is fast-paced and never boring.  Once I started reading the book, I couldn't put it down.  The characters were original and intriguing because each had their own mysteries and backgrounds that you had to read to find out about.  I would recommend this book to anyone who likes action/sci-fi books with a little romance thrown in. </p>
<p/>
<p>Reviewer Age:17</p>
<p>
Reviewer City, State and Country: Brownsburg, Indiana United States of America
<br/>
</p>
</div>

预期输出:

name=Kthom
text =  It's been eight years since the Assembly (an alien race) took over Earth and captured all the adult population. Eight years that Holt Hawkins has spent as a bounty hunter in a world ruled by rebel youths.  Holt is transporting his latest prisoner, Mira, to the Midnight City to collect his reward when the two come across a crashed Assembly ship with a young girl named Zoey trapped inside.  Together, they rescue Zoey and soon discover her magical abilities that could stop the Assembly for good.  The three embark on a treacherous journey across the barren wasteland they once called home, fighting for their own lives as well as each others'.Midnight City is an amazing book.  In the beginning, you don't really know how Earth was captured, but you know enough to be able to read and enjoy the book and learn more as the book goes on.  The author reveals the right amount of information throughout the book, otherwise there would be a whole history section that wasn't needed.  The book is fast-paced and never boring.  Once I started reading the book, I couldn't put it down.  The characters were original and intriguing because each had their own mysteries and backgrounds that you had to read to find out about.  I would recommend this book to anyone who likes action/sci-fi books with a little romance thrown in.
Age=17
country = Brownsburg, Indiana United States of America

1 个答案:

答案 0 :(得分:0)

源html有点“wild”,所以没有干净的解决方案。要获得这些值,您必须混合 选择器 正则表达式

这是一个示例(请注意,还没有错误检查等等!)

final String html = "<div class=\"text\">\n"
        + "  Flamingnet Student Book Reviewer  KThom\n"
        + "<br/>\n"
        + "     It's been eight years since the Assembly (an alien race) took over Earth and captured all the adult population. Eight years that Holt Hawkins has spent as a bounty hunter in a world ruled by rebel youths.  Holt is transporting his latest prisoner, Mira, to the Midnight City to collect his reward when the two come across a crashed Assembly ship with a young girl named Zoey trapped inside.  Together, they rescue Zoey and soon discover her magical abilities that could stop the Assembly for good.  The three embark on a treacherous journey across the barren wasteland they once called home, fighting for their own lives as well as each others'.\n"
        + "<p>Midnight City is an amazing book.  In the beginning, you don't really know how Earth was captured, but you know enough to be able to read and enjoy the book and learn more as the book goes on.  The author reveals the right amount of information throughout the book, otherwise there would be a whole history section that wasn't needed.  The book is fast-paced and never boring.  Once I started reading the book, I couldn't put it down.  The characters were original and intriguing because each had their own mysteries and backgrounds that you had to read to find out about.  I would recommend this book to anyone who likes action/sci-fi books with a little romance thrown in. </p>\n"
        + "<p/>\n"
        + "<p>Reviewer Age:17</p>\n"
        + "<p>\n"
        + "Reviewer City, State and Country: Brownsburg, Indiana United States of America\n"
        + "<br/>\n"
        + "</p>";


// -- Basic input parsing --
Document doc = Jsoup.parse(html);
Element divTag = doc.select("div.text").first(); // Take first element found, if there are more, just iterate over them


// -- Name --
String name = divTag.childNode(0).toString();
name = name.split("Flamingnet Student Book Reviewer")[1].trim(); // Cut off, the things not needed

// -- Text --
StringBuilder text = new StringBuilder(divTag.childNode(2).toString().trim()); // Text is constructed from two tags
text.append(divTag.select("p").first().text().trim()); // 2nd Part (thats the one in p-tag)

// -- Age --
String age = divTag.select("p:matches((?i)Reviewer Age)").first().text().split(":")[1].trim();

// -- Country --
String country = divTag.select("p:matches((?i)Reviewer City*)").first().text().split(":")[1].trim();


// -- Output --
System.out.println("name=" + name);
System.out.println("text=" + text);
System.out.println("age=" + age);
System.out.println("country=" + country);

方法split(":")[1]用于切断值描述。 trim()字符串总是一个好主意,因此删除了前导/尾随空白。

最后输出:

name=KThom
text=It's been eight years since the Assembly (an alien race) took over Earth and captured all the adult population. Eight years that Holt Hawkins has spent as a bounty hunter in a world ruled by rebel youths. Holt is transporting his latest prisoner, Mira, to the Midnight City to collect his reward when the two come across a crashed Assembly ship with a young girl named Zoey trapped inside. Together, they rescue Zoey and soon discover her magical abilities that could stop the Assembly for good. The three embark on a treacherous journey across the barren wasteland they once called home, fighting for their own lives as well as each others'.Midnight City is an amazing book. In the beginning, you don't really know how Earth was captured, but you know enough to be able to read and enjoy the book and learn more as the book goes on. The author reveals the right amount of information throughout the book, otherwise there would be a whole history section that wasn't needed. The book is fast-paced and never boring. Once I started reading the book, I couldn't put it down. The characters were original and intriguing because each had their own mysteries and backgrounds that you had to read to find out about. I would recommend this book to anyone who likes action/sci-fi books with a little romance thrown in.
age=17
country=Brownsburg, Indiana United States of America

有关Jsoup-Selectors的进一步文档,请参阅Use selector-syntax to find elements

相关问题