RSS Feed说明返回'<'

时间:2013-04-04 15:37:15

标签: java android rss jsoup rss-reader

我正在尝试解析此Feed中的rss数据:http://fulltextrssfeed.com/feeds.bbci.co.uk/news/rss.xml,它是使用FullTextRssFeed网站生成的。唯一的问题是,当我尝试获取描述时,我会收到'<',其他一切都很正常!我已经尝试使用JSoup,但我不知道该怎么做。你能建议怎么做? 我使用的代码与this tutorial 中使用的代码相同,但我已经替换了使用的RSS网址。再次感谢! *Here is my RSS reader in action*

4 个答案:

答案 0 :(得分:3)

您的问题是因为RSS源内的描述包含html而不是纯文本。以下是描述内容:

<div><span class="story-date"><span class="date">3 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">23:25 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/66739000/jpg/_66739180_philpotts.jpg" width="464" height="261" alt="Mick and Mairead Philpott, Paul Mosley"/><span class="c2">Mick and Mairead Philpott, and Paul Mosley, will be sentenced on Thursday</span></p> <p class="introduction" id="story_continues_1">A couple convicted of killing six of their children in a house fire in Derby are due to be sentenced later.</p> <p>Mick and Mairead Philpott will reappear at Nottingham Crown Court where they were found guilty of six counts of manslaughter, along with their friend Paul Mosley, on Tuesday.</p> <p>The maximum sentence for the crime is life imprisonment.</p> <p>Mrs Justice Thirlwall was due to pass sentence on Wednesday but needed more time to consider mitigation.</p> <p>The court was told that Philpott, 56, was jailed for seven years in 1978 for attempting to murder a previous girlfriend and given a concurrent five-year sentence for stabbing the woman's mother.</p> <p>In 1991 he received a conditional discharge for assault after he head-butted a colleague</p> <p>And in 2010 he was given a police caution after slapping Mairead and dragging her outside by her hair.</p> <p>When Philpott set fire to his house in Victory Road, Derby, he was also facing trial over a road rage incident in which he punched a motorist in the face.</p> <p>He had admitted common assault in relation to the incident but denied dangerous driving.</p> <span class="cross-head">Rape allegation</span> <p>Police have also confirmed that they intend to "thoroughly" investigate an allegation that Philpott raped a woman several years ago.</p> <p>She made the allegation after the death of Philpott's children, but police decided to wait until the end of the manslaughter trial before investigating the complaint further.</p> <p>On Tuesday the jury returned unanimous manslaughter verdicts on Philpott and Mosley, 46, while Mairead Philpott, 32, was convicted by a majority.</p> <p>Jade Philpott, 10, John, nine, Jack, eight, Jesse, six, and Jayden, five, died on the morning of the fire on 11 May 2012.</p> <p>Mairead Philpott's son from a previous relationship, 13-year-old Duwayne, died later in hospital.</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />

您需要以某种方式更改解析器,它可以忽略描述中html内容中的解析器。一旦获得完整的html片段,就可以在WebView中呈现它。我认为,当存在一些XML内容(在这种情况下为HTML)时,通常使用CDATA,这些内容位于XML数据片段(如RSS源)中。老实说,虽然我不熟悉它的细节,但我可能是不正确的。

答案 1 :(得分:2)

您从myRssFeed.getDescription()获得的HTML看起来像这样:

<div><span class="story-date"><span class="date">6 April 2013</span> <span class="time-text">Last updated at</span> <span class="time">08:57 ET</span></span> <p><img src="http://news.bbcimg.co.uk/media/images/51606000/jpg/_51606573_fa1d16c0-9c6c-4f82-b0b8-ab66ddd94f78.jpg" width="304" height="171" alt="Breaking news"/></p> <p class="introduction">Nelson Mandela has been discharged from hospital after treatment for pneumonia, South Africa's government has said.</p> <p>It said there had been "a sustained and gradual improvement in his condition".</p> <p>The 94-year-old was admitted on 27 March for a recurring lung infection and had fluid drained at the undisclosed hospital.</p> <p>Mr Mandela served as South Africa's first black president from 1994 to 1999 and is regarded by many as the father of the nation.</p> <p>The <a href="http://redirect.viglink.com?key=11fe087258b6fc0532a5ccfc924805c0&u=http%3A%2F%2Fwww.thepresidency.gov.za%2Fpebble.asp%3Frelid%3D15178">presidency statement read</a>: "Former President Nelson Mandela has been discharged from hospital today, 6 April, following a sustained and gradual improvement in his general condition.</p> <p>"The former president will now receive home-based high care. President [Jacob] Zuma thanks the hard working medical team and hospital staff for looking after Madiba so efficiently."</p> <p>Madiba is Mr Mandela's clan name.</p> <p>The statement continued: "[Mr Zuma] also extended his gratitude to all South Africans and friends of the Republic in Africa and around the world for support."</p> </div><img src="http://pixel.quantserve.com/pixel/p-89EKCgBk8MZdE.gif" border="0" height="1" width="1" />

使用Jsoup你可以尝试这个(未经测试):

而不是

feedDescribtion.setText(myRssFeed.getDescription());

使用它:

feedDescribtion.setText(extractDescriptionText(myRssFeed.getDescription());

使用以下方法:

private String extractDescriptionText(String description) {
    StringBuffer b = new StringBuffer();
    Document dom = Jsoup.parse(description);
    Elements paragraphs = dom.getElementsByTag("p");
    for (int i=1; i<paragraphs.size(); i++) { // start with 1 to skip the 'breaking news' paragraph
        Element p = paragraphs.get(i);
        b.append(p.text());
        b.append("\n"); // line-break after each paragraph
    }
    return b.toString();
}

这应该有效。也许需要进行一些微调,但是在Jsoup的帮助下,这可以很容易地实现。

修改

这是extractDescriptionText()为上述示例提供的内容:

  纳尔逊曼德拉治疗后已出院   肺炎,南非政府已经表示。它曾说过   “他的病情持续和逐步改善”。该   9月27日,94岁因肺部感染复发而入院   并在未公开的医院排出液体。曼德拉先生服务过   作为南非1994年至1999年的第一位黑人总统   被许多人视为国家之父。总统声明   读:“前总统纳尔逊曼德拉已被解职   医院今天,4月6日,经过持续和逐步的改善   在他的一般情况下。 “这位前总统现在将收到   以家庭为基础的高级护理。总统[雅各布]祖马感谢辛勤工作   医疗团队和医院工作人员如此照顾马迪巴   有效率。“马迪巴是曼德拉先生的氏族名称。声明   接续:“[祖马先生]也对所有南方表示感谢   共和国的非洲人和朋友在非洲和世界各地   支持。“

答案 2 :(得分:1)

在网上搜索有关如何执行此操作的想法时,我发现这样做实际上是illegal,因为这种获取内容的方法违反了我希望的许多网络来源的使用条款使用。现在你必须坚持使用简短的RSS源。

答案 3 :(得分:1)

我会发表评论,但我没有足够的分数。

我建议使用yahoo管道重定向你的RSS源。您甚至可以选择将其重定向为json而不是xml。

http://pipes.yahoo.com/pipes/

如果您的解析器在您访问的大多数网站上正常运行,这将是解决问题的最简单方法。