使用perl解析一个长XML文件

时间:2016-08-08 14:46:36

标签: xml perl

大家好我是一个新的perl程序员,我现在正试图从一个长XML文件中获取一些数据。但我一般不能同时获取这两个数据我的代码,请我检查如何有效地使用循环或任何结构来获取我需要的数据。

<item>
    <datetime>7/28/2016 12:00:00 AM - 12:00:15 AM</datetime>
    <datetime_raw>42579.1668402778</datetime_raw>
    <value channel="Traffic Total (volume)" channelid="1">4,664,204 KByte</value>
    <value_raw channel="Traffic Total (volume)" channelid="1">4776145337.3504</value_raw>
    <value channel="Traffic Total (speed)" channelid="1">517,319 kbit/s</value>
    <value_raw channel="Traffic Total (speed)" channelid="1">64664843.4518</value_raw>
    <value channel="Traffic DL (volume)" channelid="2">3,805,763 KByte</value>
    <value_raw channel="Traffic DL (volume)" channelid="2">3897101197.8596</value_raw>
    <value channel="Traffic DL (speed)" channelid="2">422,107 kbit/s</value>
    <value_raw channel="Traffic DL (speed)" channelid="2">52763352.2591</value_raw>
    <value channel="Traffic UL (volume)" channelid="3">858,442 KByte</value>
    <value_raw channel="Traffic UL (volume)" channelid="3">879044139.4907</value_raw>
    <value channel="Traffic UL (speed)" channelid="3">95,212 kbit/s</value>
    <value_raw channel="Traffic UL (speed)" channelid="3">11901491.1927</value_raw>
    <coverage>100 %</coverage>
    <coverage_raw>0000010000</coverage_raw>
   </item>

我有类似这些项目的趋势,我需要同时提取对数据时间和特定值channel =“Traffic Total(volume)”,这里是我的perl代码的摘录:

my $reader = XML::LibXML::Reader->new(string => "$HDF") or die "cannot read file.xml\n";

while ($reader->nextElement( 'item' )) {
                    my $item = $reader->readInnerXml;
                    while ($reader->nextElement( 'datetime' )) {
                        $DT = $reader->readInnerXml;
                        print $DT;

                        while ($reader->nextElement( 'value' )) {
                            my $value = $reader->readInnerXml;
                            if ($value eq 'Traffic Total (speed)'){
                                $HD = $reader->readInnerXml;
                                print $HD;
                            }
                        }
                    }
                }

感谢您对此的评论。

1 个答案:

答案 0 :(得分:0)

对于长XML,我发现XML::Twig确实很好 - 它可以在解析时使用git log --no-merges --cherry-pick --right-only master...topic1 --patch,因此您可以有效地处理XML的子集。 / p>

所以假设你想要通过“item”:

git log --patch --no-merges --cherry-pick --right-only master...topic1

twig_handers所执行的操作是从内存中丢弃“到目前为止”,这使得包含大量类似元素的XML非常有效。

相关问题