Perl XML ::简单解析器解码不一致

时间:2014-08-18 07:19:07

标签: perl xml-parsing

我在解析XML时遇到了“不一致”(对我来说):

use 5.14.2;
use strict;
use warnings;

use XML::Simple;
use Data::Dumper;

my $xml;
{local $/;
$xml = <DATA>;}

my $xmlParsed = XMLin($xml,
            KeyAttr => {phone => 'type', tankstelle => 'id'},
            ForceArray => [ 'phone' ],
                        ContentKey => '-content',
                       );
say Dumper($$xmlParsed{'tankstelle'});


__DATA__
<?xml version="1.0"?>
<tankstellen>
    <tankstelle>
        <id>63</id>
        <phone type="main">0911 731586</phone>
        <phone type="fax">0911 7592228</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>64</id>
        <phone type="main">0911 732011</phone>
        <phone type="fax"></phone>
        <number>64</number>
    </tankstelle>
    <tankstelle>
        <id>91</id>
        <phone type="main">0911 732926</phone>
        <phone type="fax">0911 732917</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>92</id>
        <phone type="main">0911 737577</phone>
        <phone type="fax"></phone>
        <number/>
    </tankstelle>
</tankstellen>

有时number是哈希值,有时是字符串。如果type =“fax”为空,则main包含内容。

我为解析器尝试了不同的选项,以便在没有运气的情况下摆脱main和number中的哈希值。

'64' => {
        'number' => '64',
        'phone' => {
                   'main' => {
                             'content' => '0911 732011'
                           },
                   'fax' => {}
                 }
      },
'91' => {
        'phone' => {
                   'fax' => '0911 732917',
                   'main' => '0911 732926'
                 },
        'number' => {}
      }

2 个答案:

答案 0 :(得分:2)

令人遗憾的是XML::Simple可能是CPAN上最复杂的XML模块,但初学者选择它希望轻松骑行。它自己的文档现在说这个

  

不鼓励在新代码中使用此模块。其他模块可用,提供更直接和一致的接口。特别强烈建议使用XML :: LibXML。

你已经亲眼目睹了使用除最简单的XML以外的任何东西使其正常运行是多么困难,并且它有一个巨大的缺点,因为它以与元素相同的方式处理属性。

根据作者的建议,这个简短的程序会产生类似我想要的数据结构,其优点是你可以修改它以从XML创建你喜欢的任何结构。

use strict;
use warnings;

use XML::LibXML;
use Data::Dump;

my $xml = XML::LibXML->load_xml(IO => \*DATA);

my %data;

for my $ts ($xml->findnodes('/tankstellen/tankstelle')) {

  my $id = $ts->findvalue('id');

  $data{$id}{number} = $ts->findvalue('number');

  for my $phone ($ts->findnodes('phone')) {
    my $type = $phone->findvalue('@type');
    $data{$id}{phone}{$type} = $phone->findvalue('text()');
  }
}

dd \%data;


__DATA__
<?xml version="1.0"?>
<tankstellen>
    <tankstelle>
        <id>63</id>
        <phone type="main">0911 731586</phone>
        <phone type="fax">0911 7592228</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>64</id>
        <phone type="main">0911 732011</phone>
        <phone type="fax"></phone>
        <number>64</number>
    </tankstelle>
    <tankstelle>
        <id>91</id>
        <phone type="main">0911 732926</phone>
        <phone type="fax">0911 732917</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>92</id>
        <phone type="main">0911 737577</phone>
        <phone type="fax"></phone>
        <number/>
    </tankstelle>
</tankstellen>

<强>输出

{
  63 => {
          number => "",
          phone  => { fax => "0911 7592228", main => "0911 731586" },
        },
  64 => {
          number => 64,
          phone => { fax => "", main => "0911 732011" }
        },
  91 => {
          number => "",
          phone  => { fax => "0911 732917", main => "0911 732926" },
        },
  92 => {
          number => "",
          phone => { fax => "", main => "0911 737577" }
        },
}

工具已成功完成

答案 1 :(得分:1)

如前所述,强烈建议使用XML :: LibXML。

但是,如果(对于大型XML文档)内存效率比CPU速度更重要,可以考虑另一种选择:XML::Reader::PP

use strict;
use warnings;

use XML::Reader::PP;
use Data::Dump;

my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' },
  { root => '/tankstellen/tankstelle', branch => [ 
    'id',
    'phone[@type="main"]',
    'phone[@type="fax"]',
    'number',
  ]});

my %data;

while ($rdr->iterate) {
    my ($id, $ph_main, $ph_fax, $num) = $rdr->value;
    $_ //= '' for ($id, $ph_main, $ph_fax, $num);

    $data{$id}{'number'}        = $num;
    $data{$id}{'phone'}{'main'} = $ph_main;
    $data{$id}{'phone'}{'fax'}  = $ph_fax;
}

dd \%data;

__DATA__
<?xml version="1.0"?>
<tankstellen>
    <tankstelle>
        <id>63</id>
        <phone type="main">0911 731586</phone>
        <phone type="fax">0911 7592228</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>64</id>
        <phone type="main">0911 732011</phone>
        <phone type="fax"></phone>
        <number>64</number>
    </tankstelle>
    <tankstelle>
        <id>91</id>
        <phone type="main">0911 732926</phone>
        <phone type="fax">0911 732917</phone>
        <number/>
    </tankstelle>
    <tankstelle>
        <id>92</id>
        <phone type="main">0911 737577</phone>
        <phone type="fax"></phone>
        <number/>
    </tankstelle>
</tankstellen>

输出:

{
  63 => {
          number => "",
          phone  => { fax => "0911 7592228", main => "0911 731586" },
        },
  64 => { 
          number => 64,
          phone => { fax => "", main => "0911 732011" }
        },
  91 => {
          number => "",
          phone  => { fax => "0911 732917", main => "0911 732926" },
        },
  92 => { 
          number => "",
          phone => { fax => "", main => "0911 737577" }
        },
}