Question

XML结构如下：

<Entities>
    <Entity>
        <EntityName>.... </EntityName>
        <EntityType>.... </EntityType>
        <Tables>
            <DataTables>
                <DataTable>1</DataTable>
                <DataTable>2</DataTable>
                <DataTable>3</DataTable>
                <DataTable>4</DataTable>
            </DataTables>
            <OtherTables>
                <OtherTable>5</OtherTable>
                <OtherTable>6</OtherTable>
            </OtherTables>
        </Tables>
    </Entity>
.
.
.
</Entities>

我需要根据所选的实体名称解析文件，并按照上述顺序检索所有表格。我如何在Perl中执行此操作以及应使用哪个模块？

Answer 1

我最喜欢在Perl中解析XML的模块是XML::Twig（tutorial）。

代码示例：

use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        #calls the get_tables method for each Entity element
        Entity    => sub {get_tables($_);},
    },
    pretty_print  => 'indented',                # output will be nicely formatted
    empty_tags    => 'html',                    # outputs <empty_tag />
    keep_encoding => 1,
);

$twig->parsefile(xml-file);
$twig->flush;

sub get_tables {
    my $entity = shift;

    #Retrieves the sub-elements of DataTables
    my @data_tables = $entity->first_child("Tables")->children("DataTables");
    #Do stuff with the DataTables

    #Retrieves the sub-elements of OtherTables
    my @other_tables = $entity->first_child("Tables")->children("OtherTables");
    #Do stuff with the OtherTables

    #Flushes the XML element from memory
    $entity->purge;
}

Answer 2

文档顺序为defined为

在文档中的所有节点上定义了一个排序文档顺序，对应于每个节点的XML表示的第一个字符出现在文档的XML表示中的顺序一般实体扩张后。因此，根节点将是第一个节点。元素节点出现在他们的孩子之前因此，文档顺序命令元素节点按照XML中的起始标记的顺序排列（在实体扩展之后）。

换句话说，XML文档中出现事物的顺序。 XML::XPath模块按文档顺序生成结果。例如：

#! /usr/bin/perl

use warnings;
use strict;

use XML::XPath;

my $entity_template = "/Entities"
                    . "/Entity"
                    .   "[EntityName='!!NAME!!']"
                    ;

my $tables_path = join "|" =>
                  qw( ./Tables/DataTables/DataTable
                      ./Tables/OtherTables/OtherTable );

my $xp = XML::XPath->new(ioref => *DATA);

foreach my $ename (qw/ foo bar /) {
  print "$ename:\n";
  (my $path = $entity_template) =~ s/!!NAME!!/$ename/g;
  foreach my $n ($xp->findnodes($path)) {
    foreach my $t ($xp->findnodes($tables_path, $n)) {
      print $t->toString, "\n";
    }
  }
}

__DATA__

第一个表达式搜索<Entity>个元素，其中每个元素都有一个<ElementName>子元素，其中string-value是所选的实体名称。在那里，我们会查找<DataTable>或<OtherTable>。

给出

的输入

<Entities>
    <Entity>
        <EntityName>foo</EntityName>
        <EntityType>type1</EntityType>
        <Tables>
            <DataTables>
                <DataTable>1</DataTable>
                <DataTable>2</DataTable>
            </DataTables>
            <OtherTables>
                <OtherTable>3</OtherTable>
                <OtherTable>4</OtherTable>
            </OtherTables>
        </Tables>
    </Entity>
    <Entity>
        <EntityName>bar</EntityName>
        <EntityType>type2</EntityType>
        <Tables>
            <DataTables>
                <DataTable>5</DataTable>
                <DataTable>6</DataTable>
            </DataTables>
            <OtherTables>
                <OtherTable>7</OtherTable>
                <OtherTable>8</OtherTable>
            </OtherTables>
        </Tables>
    </Entity>
</Entities>

输出

foo:
<DataTable>1</DataTable>
<DataTable>2</DataTable>
<OtherTable>3</OtherTable>
<OtherTable>4</OtherTable>
bar:
<DataTable>5</DataTable>
<DataTable>6</DataTable>
<OtherTable>7</OtherTable>
<OtherTable>8</OtherTable>

要提取字符串值（“内部文本”），请将$tables_path更改为

my $tables_path = ". / Tables / DataTables  / DataTable  / text() |
                   . / Tables / OtherTables / OtherTable / text()";

是的，这是重复的 - 因为XML :: XPath实现了XPath 1.0。

输出：

foo:
1
2
3
4
bar:
5
6
7
8

Answer 3

我更喜欢XML::LibXML，它允许你（和我）使用XPath来选择元素。

您可能希望查看script I wrote with it。

Answer 4

见：xml-simple

在使用它之前，请记住，像

这样的一些点

XML :: Simple能够呈现一个简单的API，因为它代表您做出一些假设。其中包括：

您对文字内容不感兴趣仅由空格组成
当事情发生时，你不介意陷入哈希，订单丢失
您不希望细粒度控制生成的XML的格式化
你永远不会使用哈希密钥不是合法的XML元素名称
您无需转换帮助在不同的编码之间

对于基于事件的解析，请使用SAX（不要开始为XML :: Parser的处理程序API编写任何新代码 - 这是合法的）。

对于基于树的解析，您可以选择XML :: Twig的'Perlish'方法和更多基于标准的DOM实现 - 最好是支持XPath的方法。

来源：XML-Simple

有关Perl-XML的更多详细信息，请参阅Perl-XML

在Perl中解析XML文件 - 保留序列

4 个答案: