使用NodeJS从RSS feed提取CDATA

时间:2018-09-10 20:58:49

标签: javascript node.js parsing rss cdata

我正在使用2.2.9版的feedparser解析供稿: “ https://www.veganlifemag.com/feed/”。

关于rss提要的description标签,它具有HTML(CDATA)内容和包含我需要提取的内容的标签。我想知道是否有一种方法可以提取CDATA中的内容或特定内容。

预先感谢

杰里

RSS供稿示例

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
<title>VegNews.com (News)</title>
<description></description>
<link>https://vegnews.com/news</link>
<language>en</language>
<item>
  <title>London Fashion Week Will Be Fur-Free This Year for the First Time</title>
  <category>News</category>
  <pubDate>Mon, 10 Sep 2018 01:50:00 -0700</pubDate>
  <link>https://vegnews.com/2018/9/london-fashion-week-will-be-fur-free-this-year-for-the-first-time</link>
  <guid>https://vegnews.com/2018/9/london-fashion-week-will-be-fur-free-this-year-for-the-first-time</guid>
  <description>
    <![CDATA[<img src="https://vegnews.com/media/W1siZiIsIjEyOTE1L1ZlZ05ld3MuRmFzaGlvbkxvbmRvbi5wbmciXSxbInAiLCJ0aHVtYiIsIjgwMHg0NzMjIix7ImZvcm1hdCI6ImpwZyJ9XSxbInAiLCJvcHRpbWl6ZSJdXQ/VegNews.FashionLondon.png?sha=ec3755007e36522e" /><p>Anticipated event London Fashion Week (LFW) kicks off September 14, this year with no fur in sight. While LFW did not impose a ban on fur, every designer that will present their collections this year has adopted a fur-free policy, including last-minute holdout Burberry. After more than a decade of pressure from animal-rights organizations, including <a href="http://www.hsi.org/" target="_blank" rel="noopener">Humane Society International UK</a> and <a href="https://www.peta.org/" target="_blank" rel="noopener">People for the Ethical Treatment of Animals</a>, Burberry announced this month that it would no longer use fur in its collections and appointed Riccardo Tisci as its new creative director to phase out any remaining fur items. &ldquo;I don&rsquo;t think it is compatible with modern luxury and with the environment in which we live, and Riccardo has a very strong view as well on this,&rdquo; LFW CEO Marco Gobbetti told <a href="https://www.businessoffashion.com/articles/professional/burberry-stops-destroying-product-and-bans-real-fur" target="_blank" rel="noopener"><em>Business of Fashion</em></a>. &ldquo;It&rsquo;s part of what Burberry is today.&rdquo; Similarly, animal fur is falling out of favor in the United States. Earlier this year, American designer <a href="https://vegnews.com/2018/3/dkny-and-donna-karan-go-fur-free" target="_blank" rel="noopener">Donna Karan</a> pledged to eliminate the material from her future collections, and the city of <a href="https://vegnews.com/2018/3/san-francisco-bans-fur-sales" target="_blank" rel="noopener">San Francisco</a> joined <a href="https://vegnews.com/2013/9/west-hollywood-says-no-to-real-fur-in-fashion" target="_blank" rel="noopener">West Hollywood</a> and <a href="https://vegnews.com/2017/4/berkeley-prohibits-fur-sales-citywide" target="_blank" rel="noopener">Berkeley</a> in banning fur sales within city limits.</p>]]>
  </description>
</item>

1 个答案:

答案 0 :(得分:1)

CDATA仅表示“以纯文本格式处理此内容”,因此它忽略了通常在XML中通常具有特殊含义的字符的特殊含义(例如------------- | IDa | IDb | ------------- | 1 | 3 | | 1 | 5 | | 2 | 4 | | 2 | 5 | | (reverse) | | 3 | 1 | | 4 | 2 | | 5 | 1 | | 5 | 2 | ------------- 的意思是“标记的开始”)。

描述的值是HTML的一部分。如果要从中提取特定内容,请通过HTML解析器运行它。