我正在开发一个JSON应用程序。我能够下载所有数据,但我遇到了一个有趣的问题。我正在尝试使用域名抓取一个字符串:
http://www.prindlepost.org/
当抓取所有JSON时,我得到一个非常大的字符串,我无法在那里粘贴。我试图解析的部分是:
<p>The road through Belgrade was quiet at 4 A.M. Besides the occasional whir of another car speeding by, my taxi was largely alone on the road. Through the windshield I could see the last traces of apartment blocks pass by as we left the outskirts of the city. Somewhere beyond the limits of my vision, I knew the airport waited, its converging neon runway lines already lighting up the pre-dawn darkness.</p>
<div class="more-link-wrap wpb_button"> <a href="http://www.prindlepost.org/2015/06/this-is-a-self-portrait/" class="more-link">Read more</a></div>
我关注的地方:
<a href="http://www.prindlepost.org/2015/06/this-is-a-self-portrait/" class="more-link">Read more</a></div>
我不熟悉这样提取字符串。最后,我希望能够将URL保存为自己的字符串。例如,以上内容将转换为:
String url = "http://www.prindlepost.org/2015/06/this-is-a-self-portrait/";
有一点需要注意,有很多网址按类名缩小可能会帮助我一堆。
我最初的猜测是:
// <READ MORE>
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(content);
String urlTemp = null;
if (m.find()) {
urlTemp = m.group(1); // this variable should contain the link URL
}
Log.d("LINK WITHIN TEXT", ""+urlTemp);
// </READ MORE>
感谢任何帮助!
答案 0 :(得分:0)
尝试使用类似http://jsoup.org/
的内容可能会有所作为如果您查看解析链接的示例:
String html = "<p>The road through Belgrade was quiet at 4 A.M. Besides the occasional whir of another car speeding by, my taxi was largely alone on the road. Through the windshield I could see the last traces of apartment blocks pass by as we left the outskirts of the city. Somewhere beyond the limits of my vision, I knew the airport waited, its converging neon runway lines already lighting up the pre-dawn darkness.</p>"
+ "<div class=\"more-link-wrap wpb_button\">"
+ "<a href=\"http://www.prindlepost.org/2015/06/this-is-a-self-portrait/\" class=\"more-link\">"
+ "Read more</a></div>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/2015/06/this-is-a-self-portrait/"
String absHref = link.attr("abs:href"); // "http://www.prindlepost.org/2015/06/this-is-a-self-portrait/"