使用shell脚本从URL中提取xml数据/内容

时间:2011-03-15 05:19:02

标签: unix

我需要从url .say中的file.xml下载xml内容,例如这是url http://www.pistonheads.co.uk/xml/news091.asp?c=26我想将它的xml内容提取到file.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="0.91">
<channel>
<title>PistonHeads (Motoring News)</title>
<link>http://www.pistonheads.com/news/</link>
<description>Motoring News</description>

<item>
<title>Bowler Nemesis Joins Spyker At CPP</title>
<description>Plans confired for Nemesis EXR road car to be built in Coventry</description>
</item>
</channel>
</rss>

我试过wget“url”-o file.xml ...当我打开file.xml时......它只是返回

  

http://www.pistonheads.co.uk/xml/news091.asp?c=26   =&GT; `news091.asp?c = 26'解决www.pistonheads.co.uk ......完成。连接到www.pistonheads.co.ukhttp://xx.xxx.xxx.xx已连接。   发送HTTP请求,等待响应... 200 OK长度:5,016 text / xml

     

0K .... 100%445.31 KB / s

     13:37:13(445.31 KB / s) - “news091.asp?c = 26”已保存5016/5016

还有其他方法可以解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

如果您想将此作为输出:

PistonHeads (Motoring News) http://www.pistonheads.com/news/ Motoring News

然后这将解决问题:

wget -q -O - http://www.pistonheads.co.uk/xml/news091.asp?c=26 \
  | egrep '(title>|link>|description>)' | head -3 \
  | sed -e 's/.*>\([^>]*\)<.*/\1/' | tr '\n' ' '

但是,如果您只想将链接的输出写入文件,请使用:

wget -O file.xml http://www.pistonheads.co.uk/xml/news091.asp?c=2

注意写入文件选项的大写字母O。