用于使用递增值替换字符串的shell脚本

时间:2013-03-08 07:24:35

标签: shell

我有一个.xml文件,我必须在其中搜索“<reviseddate>”标记。它可以在文件中多次出现。如果是这样,我必须将“<reviseddate>”标记替换为“<reviseddate1>”我需要一个shell脚本

案文的样本如下:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>,
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

输出应如下

Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,        
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

我试过了:

for i in $c do 
   sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml
   cp $path/input_new.xml $path/$input_file 
   rm -f input_new.xml 
done

1 个答案:

答案 0 :(得分:0)

我会使用像这样的Perl脚本来完成这项工作:

#!/usr/bin/env perl
use strict;
use warnings;

my $i = 1;
while (<>)
{
    while (m%<reviseddate>([^<]+)</reviseddate>%)
    {
        s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%;
        $i++;
    }
    print;
}

对于每一行,对于每个未编号的<reviseddate>标记,请使用适当编号的标记替换标记。

示例输出:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

您可以对此进行调整以处理其他方案,例如一行上的开始标记和下一行的结束标记。直到你需要它为止,没有必要为此烦恼。使用正则表达式是一门艺术。您需要在所有可能的情况下平衡迫切需求与弹性。


由于Perl显然不是'shell'(但sed是),您可以安排经常处理文件以查找所有条目并进行更改。

tmp=$(mktemp ./revise.XXXXXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

i=1
while grep -s '<reviseddate>' filename
do
    sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp
    mv $tmp filename
    i=$(($i+1))
done

rm -f $tmp # Should be a no-op
trap 0

这会迭代更新文件。 1,/<reviseddata>部分确保只更新第一个<reviseddate>标记(g命令上没有s%%%,这是至关重要的)。陷阱代码可确保不留下临时文件。

这适用于您的样本数据,提供相同的输出。对于小文件,它很好。如果您正在管理多GB文件,Perl会更好,因为它会处理一次文件。