使用shell脚本格式化文本文件内容

时间:2013-08-02 09:59:41

标签: regex linux shell

我想用shell脚本格式化我的文本文件内容。内容如下:

http://copyright.gov.in Inlinks:
 fromUrl: http://mhrd.gov.in/ anchor: Copyright
 fromUrl: http://mhrd.gov.in/hi/home anchor: कॉपीराइट
 fromUrl: http://mhrd.gov.in/?fontsize=normal anchor: Copyright
 fromUrl: http://mhrd.gov.in/?contrast=high anchor: Copyright
 fromUrl: http://mhrd.gov.in/?fontsize=large anchor: Copyright
 fromUrl: http://mhrd.gov.in/sitemap anchor: Copyright
 fromUrl: http://mhrd.gov.in/?fontsize=small anchor: Copyright
 fromUrl: http://mhrd.gov.in/hi anchor: कॉपीराइट
 fromUrl: http://mhrd.gov.in/?contrast=normal anchor: Copyright

我希望格式化输出为:

http://copyright.gov.in -> http://mhrd.gov.in/
http://copyright.gov.in -> http://mhrd.gov.in/hi/home 
http://copyright.gov.in -> http://mhrd.gov.in/?fontsize=normal

等等

1 个答案:

答案 0 :(得分:1)

$ cat foo.input
http://copyright.gov.in Inlinks:
 fromUrl: http://mhrd.gov.in/ anchor: foo
 fromUrl: http://mhrd.gov.in/hi anchor: bar
http://foo.acme.gov Inlinks:
 fromUrl: http://foo.acme.gov/ anchor: foo
 fromUrl: http://foo.acme.gov/about anchor: bar

$ awk '/^http/ { host=$1; next } NF { printf "%s -> %s\n", host, $2 }' foo.input
http://copyright.gov.in -> http://mhrd.gov.in/
http://copyright.gov.in -> http://mhrd.gov.in/hi
http://foo.acme.gov -> http://foo.acme.gov/
http://foo.acme.gov -> http://foo.acme.gov/about