How to get the feed URL(s) from a website?

时间:2018-03-25 18:58:26

标签: http request rss atom-feed

As per the official documentation, properly setup websites should indicate the URL of their RSS / Atom feed(s) when asked politely:

GET / HTTP/1.1
Host: example.com
Accept: application/rss+xml, application/xhtml+xml, text/html

When an HTTP server (or server-side script) gets this, it should redirect the HTTP client to the feed. It should do this with an HTTP 302 Found. Something like:

HTTP/1.1 302 Found
Location: http://example.com/feed

I'm trying to get this response, without luck:

request(
  { method: 'GET',
    url: 'https://stackoverflow.com',
    followRedirect :false,
    accept: ['application/rss+xml', 'application/xhtml+xml', 'text/html']
  }, function (error, response, body) {
    console.log('statusCode: ', response.statusCode);
  }
);

Yelds

statusCode: 200

How do I formulate my request so that the website responds with the feed URL(s)?

2 个答案:

答案 0 :(得分:1)

网站通常不会将HTTP请求从HTTP请求发送回主页,要求接受中的 application / rss + xml MIME类型>标题。关于你已经链接的Mozilla的文档是我多年来作为开发人员参与RSS之后从未见过的建议。

一种更为既定且广泛采用的网站识别RSS源的方法是一种名为RSS Autodiscovery的技术。打开网站的主页,在 HEAD 部分中查找此标记:

<link rel="alternate" type="application/rss+xml" title="RSS"
    href="http://feeds.example.com/rss-feed">

类型属性可以是RSS,Atom或JSONFeed供稿的任何MIME类型。

答案 1 :(得分:0)

The material you quote is prefixed with:

Although this advanced technique for syndication is not required, support of this is recommended, especially for web sites and applications with high performance needs.

If you get HTML back, then you should construct a DOM with an HTML parser and then search it for the appropriate <link> element as described in an earlier section of that page.