在perl中使用libxml迭代元素

时间:2017-05-24 12:39:01

标签: xml perl xpath xml-libxml

我有一个如下所示的XML文件,

<?xml version="1.0"?>
<data>
  <header>
    <name>V9 Red Indices</name>
    <version>9</version>
    <date>2017-03-16</date>
  </header>
  <index>
    <indexfamily>ITRAXX-Asian</indexfamily>
    <indexsubfamily>iTraxx Rest of Asia</indexsubfamily>                
    <paymentfrequency>3M</paymentfrequency>
    <recoveryrate>0.35</recoveryrate>
    <constituents>
      <constituent>
        <refentity>
          <originalconstituent>
            <referenceentity>ICICI Bank Limited</referenceentity>
            <redentitycode>Y1BDCC</redentitycode>
            <role>Issuer</role>
            <redpaircode>Y1BDCCAA9</redpaircode>
            <jurisdiction>India</jurisdiction>
            <tier>SNRFOR</tier>
            <pairiscurrent>false</pairiscurrent>
            <pairvalidfrom>2002-03-30</pairvalidfrom>
            <pairvalidto>2008-10-22</pairvalidto>
            <ticker>ICICIB</ticker>
            <ispreferred>false</ispreferred>
            <docclause>CR</docclause>
            <recorddate>2014-02-25</recorddate>
            <weight>0.0769</weight>
          </originalconstituent>
        </refentity>
        <refobligation>
          <type>Bond</type>
          <isconvert>false</isconvert>
          <isperp>false</isperp>
          <coupontype>Fixed</coupontype>
          <ccy>USD</ccy>
          <maturity>2008-10-22</maturity>
          <coupon>0.0475</coupon>
          <isin>XS0178885876</isin>
          <cusip>Y38575AQ2</cusip>
          <event>Matured</event>
          <obligationname>ICICIB 4.75 22Oct08</obligationname>
          <prospectusinfo>
            <issuers>                                                        
              <origissuersasperprosp>ICICI Bank Limited</origissuersasperprosp>
            </issuers>
          </prospectusinfo>
        </refobligation>
      </constituent>
    </constituents>
  </index>
</data>

我想在不知道标签名称的情况下迭代这个文件。我的最终目标是创建一个带有标签名称和值的哈希。

我不想对每个节点使用findnodes和XPath。这违背了编写通用加载程序的整个目的。

我也在使用XML-LibXML-2.0126,这是一个旧版本。

我的部分代码使用findnodes如下。 XML也缩短了,以避免现在变成的冗长查询:)

use XML::LibXML;

my $xmldoc = $parser->parse_file( $fileName );
my $root = $xmldoc->getDocumentElement() || die( "Could not get Document Element \n" );

foreach my $index ( $root->findnodes( "index" ) ) {    # $root->getChildNodes()) # Get all the Indexes

    foreach my $constituent ( $index->findnodes( 'constituents/constituent' ) ) { # Lets pick up all Constituents

        my $referenceentity = $constituent->findnodes( 'refentity/originalconstituent/referenceentity' );    # This is a crude way. we should be iterating without knowing whats inside

        print "referenceentity :" . $referenceentity . "\n";
        print "+++++++++++++++++++++++++++++++++++ \n";
    }
}

2 个答案:

答案 0 :(得分:1)

使用XML::LibXML::Node提供的nonBlankChildNodesnodeNametextContent方法:

my %hash;

for my $node ( $oc->nonBlankChildNodes ) {

    my $tag = $node->nodeName;
    my $value = $node->textContent;
    $hash{$tag} = $value;
}

相当于:

my %hash = map { $_->nodeName, $_->textContent } $oc->nonBlankChildNodes;

答案 1 :(得分:0)

你确定要这个吗?从解析的XML::LibXML::Document对象访问任意数据同样简单,因为它来自嵌套的Perl哈希。它肯定会比同等对象占用更少的内存空间,如果这是你的意图,但从你的问题来看它并不是这样的

您可以使用XML::Parser模块轻松完成此操作,该模块每次发生&#34;事件时都会调用回调。发生在XML数据中。在这种情况下,我们感兴趣的是开放标记,关闭标记和文本字符串

此示例代码从XML构建嵌套哈希。如果XML数据格式不正确(结束标记与开始标记的名称不匹配)或者任何元素具有一个或多个属性,它就会以适当的消息消失,这些属性无法表示这个结构

我已使用Data::Dump显示结果

use strict;
use warnings 'all';

use XML::Parser;
use Data::Dump;

my $parser = XML::Parser->new(
    Style    => 'Debug',
    Handlers => {
        Start => \&handle_start,
        End   => \&handle_end,
        Char  => \&handle_char,
    },
);


my %data;
my @data_stack = ( \%data );
my @elem_stack;

$parser->parsefile( 'index.xml' );
dd \%data;


sub handle_start {
    my ($expat, $elem) = @_;

    my $data = $data_stack[-1]{$elem} = { };
    push @data_stack, $data;
    push @elem_stack, $elem;

    if ( @_ > 2 ) {
        my $xpath = join '', map "/$_", @elem_stack;
        die qq{Element at $xpath has attributes};
    }
}


sub handle_end {
    my ($expat, $elem) = @_;

    my $top_elem = pop @elem_stack;
    die qq{Bad XML structure $elem <=> $top_elem} unless $elem eq $top_elem;

    pop @data_stack;
}


sub handle_char {
    my ($expat, $str) = @_;

    return unless $str =~ /\S/;

    my $top_elem = $elem_stack[-1];

    $data_stack[-2]{$top_elem} = $str;
}

输出

{
    data => {
        header => {
            date => "2017-03-16",
            name => "V9 Red Indices",
            version => 9,
        },
        index  => {
            constituents => {
                constituent => {
                    refentity => {
                        originalconstituent => {
                            docclause       => "CR",
                            ispreferred     => "false",
                            jurisdiction    => "India",
                            pairiscurrent   => "false",
                            pairvalidfrom   => "2002-03-30",
                            pairvalidto     => "2008-10-22",
                            recorddate      => "2014-02-25",
                            redentitycode   => "Y1BDCC",
                            redpaircode     => "Y1BDCCAA9",
                            referenceentity => "ICICI Bank Limited",
                            role            => "Issuer",
                            ticker          => "ICICIB",
                            tier            => "SNRFOR",
                            weight          => 0.0769,
                        },
                    },
                    refobligation => {
                        ccy            => "USD",
                        coupon         => 0.0475,
                        coupontype     => "Fixed",
                        cusip          => "Y38575AQ2",
                        event          => "Matured",
                        isconvert      => "false",
                        isin           => "XS0178885876",
                        isperp         => "false",
                        maturity       => "2008-10-22",
                        obligationname => "ICICIB 4.75 22Oct08",
                        prospectusinfo => {
                            issuers => {
                                origissuersasperprosp => "ICICI Bank Limited"
                            },
                        },
                        type => "Bond",
                    },
                },
            },
            indexfamily      => "ITRAXX-Asian",
            indexsubfamily   => "iTraxx Rest of Asia",
            paymentfrequency => "3M",
            recoveryrate     => 0.35,
        },
    },
}