删除非数字字符perl

时间:2015-02-07 16:59:53

标签: perl

我的文件有多个引号,如下所示:

  <verse-no>quote</verse-no>
            <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
            <quote>When Adam came from the Creator’s hand, he bore, in his physical, mental, and
                spiritual nature, a likeness to his Maker. “God created man in His own image”
                (Genesis 1:27), and it was His purpose that the longer man lived the more fully
                he should reveal this image—the more fully reflect the glory of the Creator. All
                his faculties were capable of development; their capacity and vigor were
                continually to increase. Ed 15
            </quote>

我想删除<quote-verse>.....</quote-verse>行中的所有字符串,以便最终结果为<quote>1:26,27</quote>

我试过了perl -pi.bak -e 's#\D*$<\/quote-verse>#<\/quote-verse>#g' file.txt

这没有任何作用。我是perl(自学成才)的初学者,经验不到10天。请告诉我哪些错误以及如何继续。

1 个答案:

答案 0 :(得分:2)

你有XML。因此,您需要一个XML解析器。 XML::Twig是个好人。 之所以有很多人说不使用正则表达式解析XML&#39;是因为虽然 在有限的范围内工作。但XML是一种规范,某些东西是有效的,有些则不是。如果您制作的代码基于不总是正确的假设,那么您最终得到的是脆弱的代码 - 如果有人将完全有效的XML改为略有不同,那么代码会在没有警告的情况下突破一天但仍然是完全有效的XML。

考虑到这一点:

这有效:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

sub quote_verse_handler {
    my ( $twig, $quote ) = @_;
    my $text = $quote->text;
    $text =~ s/(\d)\D+$/$1/;
    $quote->set_text($text);
}

my $parser = XML::Twig->new(
    twig_handlers => { 'quote-verse' => \&quote_verse_handler },
    pretty_print  => 'indented'
);


#$parser -> parsefile ( 'your_file.xml' );
local $/;
$parser->parse(<DATA>);
$parser->print;


__DATA__
<xml>
<verse-no>quote</verse-no>
        <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
        <quote>When Adam came from the Creator's hand, he bore, in his physical, mental, and
            spiritual nature, a likeness to his Maker. "God created man in His own image"
            (Genesis 1:27), and it was His purpose that the longer man lived the more fully
            he should reveal this image-the more fully reflect the glory of the Creator. All
            his faculties were capable of development; their capacity and vigor were
            continually to increase. Ed 15
        </quote>
   </xml>

这是做什么的 - 运行你的文件。每当遇到一个部分quote-verse时,它就会调用处理程序,并将其提供给该部分&#39;用XML做的事情。我们应用正则表达式,删除该行的尾部位,然后相应地更新XML。

解析完成后,我们吐出成品。

您可能想要替换:

local $/;
$parser -> parse ( <DATA> );

使用:

$parser -> parsefile ( 'your_file_name' );

你也可以找到:

$parser -> print_to_file( 'output_filename' ); 

有用。