如何解析xml并提取元素并写入具有指定条件的新xml

时间:2015-06-12 04:47:11

标签: xml perl

当值为null或NA时,我需要从给定的xml中提取xml元素。

    <?xml version="1.0" encoding="UTF-8"?>
    <log>
    <logentry revision="21754">
    <author>Madhu</author>
    <date>2015-05-12</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
   <Description>HotFix_MaxConnectionReduction.dBAssembly.xml file in Release-  Branch</Description>
   <HP_Code_ReviewID>CR1234</HP_Code_ReviewID>
   <Deployment_Change_Needed>No</Deployment_Change_Needed>
   <Deployment_Change_Description>NA
   </Deployment_Change_Description>
   </logentry>
   <logentry revision="21779">
   <author>sudha</author>
   <date>2015-05-19</date>
   <QC_ID>NA</QC_ID>
   <Rally_ID>US4940</Rally_ID>
   <Description> Adding Release-Branch</Description>
   <HP_Code_ReviewID> NA</HP_Code_ReviewID>
   <Deployment_Change_Needed>No</Deployment_Change_Needed>
   <Deployment_Change_Description>NA
   </Deployment_Change_Description>
 </logentry>
<logentry revision="21808">
<author>sudha</author>
<date>2015-05-25</date>
<QC_ID>NA</QC_ID>
<Rally_ID>US4940</Rally_ID>
<Description>  modifying 15.6.1 in PP Release-Branch to bring new spaces in modules </Description>
<HP_Code_ReviewID> NA</HP_Code_ReviewID>
<Deployment_Change_Needed>No</Deployment_Change_Needed>
<Deployment_Change_Description>NA
</Deployment_Change_Description>
</logentry>
</log>

当值为null或NA

时,我需要提取xml元素

并创建新的xml来处理

上述示例的预期输出为(“HP_Code_ReviewID标记值为NA”)

 <?xml version="1.0" encoding="UTF-8"?>
 <log>
<logentry revision="21808">
<author>sudha</author>
<date>2015-05-25</date>
<QC_ID>NA</QC_ID>
<Rally_ID>US4940</Rally_ID>
<Description>  modifying 15.6.1 in PP Release-Branch to bring new spaces in modules </Description>
<HP_Code_ReviewID> NA</HP_Code_ReviewID>
<Deployment_Change_Needed>No</Deployment_Change_Needed>
<Deployment_Change_Description>NA
</Deployment_Change_Description>
</logentry>
</log>

1 个答案:

答案 0 :(得分:0)

Perl有一个出色的XML::Twig库,可用于XML解析和解决问题。我将为您提供一个入门示例,但请注意 - Stack Overflow是帮助您解决代码问题,而不是为您编写代码。

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => {
        'HP_Code_ReviewID' => sub {
            if ( $_->text =~ m/NA/ ) { $_->parent->print }
        }
    }
)->parse( \*DATA );

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
    <log>
    <logentry revision="21754">
    <author>Madhu</author>
    <date>2015-05-12</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
   <Description>HotFix_MaxConnectionReduction.dBAssembly.xml file in Release-  Branch</Description>
   <HP_Code_ReviewID>CR1234</HP_Code_ReviewID>
   <Deployment_Change_Needed>No</Deployment_Change_Needed>
   <Deployment_Change_Description>NA
   </Deployment_Change_Description>
   </logentry>
   <logentry revision="21779">
   <author>sudha</author>
   <date>2015-05-19</date>
   <QC_ID>NA</QC_ID>
   <Rally_ID>US4940</Rally_ID>
   <Description> Adding Release-Branch</Description>
   <HP_Code_ReviewID> NA</HP_Code_ReviewID>
   <Deployment_Change_Needed>No</Deployment_Change_Needed>
   <Deployment_Change_Description>NA
   </Deployment_Change_Description>
 </logentry>
<logentry revision="21808">
<author>sudha</author>
<date>2015-05-25</date>
<QC_ID>NA</QC_ID>
<Rally_ID>US4940</Rally_ID>
<Description>  modifying 15.6.1 in PP Release-Branch to bring new spaces in modules </Description>
<HP_Code_ReviewID> NA</HP_Code_ReviewID>
<Deployment_Change_Needed>No</Deployment_Change_Needed>
<Deployment_Change_Description>NA
</Deployment_Change_Description>
</logentry>
</log>

这为HP_Code_ReviewID' and if the text in it contains NA`设置了一个枝条处理程序,打印出父元素。

请注意,这不是 - 显式 - 有效的XML,因为它只捕获logentry元素。但是,您可以使用XML :: Twig执行操作,例如删除匹配的元素,然后显示文档的其余部分。

打印:

  <logentry revision="21779">
    <author>sudha</author>
    <date>2015-05-19</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
    <Description> Adding Release-Branch</Description>
    <HP_Code_ReviewID> NA</HP_Code_ReviewID>
  </logentry>
  <logentry revision="21808">
    <author>sudha</author>
    <date>2015-05-25</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
    <Description>  modifying 15.6.1 in PP Release-Branch to bring new spaces in modules </Description>
    <HP_Code_ReviewID> NA</HP_Code_ReviewID>
  </logentry>

要在上下文中保留这些内容,您基本上必须删除匹配的内容:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => {
        'logentry' => sub {
            if ( not $_->first_child_text('HP_Code_ReviewID') =~ m/NA/ ) {
                $_->delete;
            }
        }
    }
)->parse( \*DATA )->print;

(与上述DATA块相同)。

这将打印:

<?xml version="1.0" encoding="UTF-8"?>
<log>
  <logentry revision="21779">
    <author>sudha</author>
    <date>2015-05-19</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
    <Description> Adding Release-Branch</Description>
    <HP_Code_ReviewID> NA</HP_Code_ReviewID>
    <Deployment_Change_Needed>No</Deployment_Change_Needed>
    <Deployment_Change_Description>NA
   </Deployment_Change_Description>
  </logentry>
  <logentry revision="21808">
    <author>sudha</author>
    <date>2015-05-25</date>
    <QC_ID>NA</QC_ID>
    <Rally_ID>US4940</Rally_ID>
    <Description>  modifying 15.6.1 in PP Release-Branch to bring new spaces in modules </Description>
    <HP_Code_ReviewID> NA</HP_Code_ReviewID>
    <Deployment_Change_Needed>No</Deployment_Change_Needed>
    <Deployment_Change_Description>NA
</Deployment_Change_Description>
  </logentry>
</log>