Question

我有一个bibtex文件，它是其他几个.bib文件的合并。在合并过程中，除了一个重复的条目之外的所有条目都被注释掉，以便所有具有重复条目的案例如下所示。其中一些有20~30个条目被注释掉，使100个参考文件长30k行。

@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

###Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}

    }

如何删除以###开头的所有行，直到带有@ exclusive的下一行？实质上，我的结果文件是：

@Article{goodnight2005,
      author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
      journal   = {{IEEE Computer Graphics and Applications}},
      title     = {{Computation on programmable graphics hardware}},
      year      = {2005},
      volume    = {25},
      number    = {5},
      pages     = {12-15}
    }

@INPROCEEDINGS{Llosa-pact96,
        author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
        title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
        booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
        year = {1996},
        pages = {80--86}

        }

例如sed'/ ### /，/ @ / {//！d}'bibliography.bib保持行以###开头，但是sed'/ ### /，/ @ / d'参考书目.bib以@离开开始行。

非常感谢您的帮助。

Answer 1

使用$skip哨兵值的简单解决方案：

use strict;
use warnings; 

my $skip = 0;
while ( <> ) {
   $skip = 1 if /^###/;
   $skip = 0 if /^@/;
   next if $skip;

   print;
}

输出：

[hmcmillen]$ perl test.pl < test.txt 
@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}
}

如果你真的希望它是一个命令：

perl -ne 'BEGIN { $SKIP = 1 } $SKIP = 1 if /^###/; $SKIP = 0 if /^@/; print unless $SKIP;' < test.txt

Answer 2

假设您的输入文件是当前目录中某处或更低位置的所有*.bib文件。

当天成为你的find perl魔术师：

find . -name '*.bib' -exec \
perl -i -ne '$o=1if/^@/;$o=0if/^###/;print if$o' \{} \;

如果您无法阅读此内容，请不要使用它。例如。它会在第一行@之前删除任何内容，并且不会考虑缩进@或###行。

还有一个名为File::Find的好模块，用perldoc File::Find阅读所有相关内容。就个人而言，他们不会将此作为单行代表。

Answer 3

使用awk：

$ awk '/###/{p=0} /@/{p=1} p' bib.text

@Article{goodnight2005,
  author    = {Goodnight, N. and Wang, R. and Humphreys, G.},
  journal   = {{IEEE Computer Graphics and Applications}},
  title     = {{Computation on programmable graphics hardware}},
  year      = {2005},
  volume    = {25},
  number    = {5},
  pages     = {12-15}
}

@INPROCEEDINGS{Llosa-pact96,
    author = {Josep Llosa and Antonio González and Eduard Ayguadé and Mateo Valero},
    title = {Swing Modulo Scheduling: A Lifetime-Sensitive Approach},
    booktitle = {In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96},
    year = {1996},
    pages = {80--86}

    }

替换两个字符之间的文本行

3 个答案: