替换两个字符之间的文本行

时间:2015-07-27 14:33:57

标签: perl text replace awk sed

我有一个bibtex文件,它是其他几个.bib文件的合并。在合并过程中,除了一个重复的条目之外的所有条目都被注释掉,以便所有具有重复条目的案例如下所示。其中一些有20~30个条目被注释掉,使100个参考文件长30k行。

@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

###Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}

    }

如何删除以###开头的所有行,直到带有@ exclusive的下一行?实质上,我的结果文件是:

@Article{goodnight2005,
      author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
      journal   = {{IEEE Computer Graphics and Applications}},
      title     = {{Computation on programmable graphics hardware}},
      year      = {2005},
      volume    = {25},
      number    = {5},
      pages     = {12-15}
    }

@INPROCEEDINGS{Llosa-pact96,
        author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
        title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
        booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
        year = {1996},
        pages = {80--86}

        }

例如sed'/ ### /,/ @ / {//!d}'bibliography.bib保持行以###开头,但是sed'/ ### /,/ @ / d'参考书目.bib以@离开开始行。

非常感谢您的帮助。

3 个答案:

答案 0 :(得分:2)

使用$skip哨兵值的简单解决方案:

use strict;
use warnings; 

my $skip = 0;
while ( <> ) {
   $skip = 1 if /^###/;
   $skip = 0 if /^@/;
   next if $skip;

   print;
}

输出:

[hmcmillen]$ perl test.pl < test.txt 
@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}
}

如果你真的希望它是一个命令:

perl -ne 'BEGIN { $SKIP = 1 } $SKIP = 1 if /^###/; $SKIP = 0 if /^@/; print unless $SKIP;' < test.txt

答案 1 :(得分:1)

假设您的输入文件是当前目录中某处或更低位置的所有*.bib文件。

当天成为你的find perl魔术师:

find . -name '*.bib' -exec \
perl -i -ne '$o=1if/^@/;$o=0if/^###/;print if$o' \{} \;

如果您无法阅读此内容,请不要使用它。例如。它会在第一行@之前删除任何内容,并且不会考虑缩进@###行。

还有一个名为File::Find的好模块,用perldoc File::Find阅读所有相关内容。就个人而言,他们不会将此作为单行代表。

答案 2 :(得分:0)

使用awk:

$ awk '/###/{p=0} /@/{p=1} p' bib.text

@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}

    }