Perl和XPath:数据库表中缺少条目

时间:2014-09-06 16:59:45

标签: mysql xml perl xpath mariadb

我想从xml文件中提取数据并将它们导入MariaDB / MySQL数据库。 xml文件是:

<?xml version="1.0" encoding="UTF-8"?>
<database>
  <row1s>
    <row1 name="fox" category="mammal">
       <row2s>
         <row2 type="1" size="10"/>
         <row2 type="2" size="8"/>
       </row2s>
       </row1>
    <row1 name="horse" category="mammal">
       <row2s>
             <row2 type="3" size="100"/>
       </row2s>
    </row1>
    <row1 name="bee" category="insect"> 
       <row2s/>
    </row1>
    <row1 name="wasp" category="insect">
       <row2s/>
    </row1>
  </row1s>
</database>

和Perl代码是:

use strict;
use warnings;
use DBI;

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new( filename => "animals4.xml" );
# my $xp = XML::XPath->new( ioref => \*DATA );

my $dbh = DBI->connect( "DBI:mysql:test", "user", "pw", { RaiseError => 1, PrintError => 0 } )
    or die "Fehler beim Verbidungsaufbau zum MariaDB-Server:" . " $DBI::err -< $DBI::errstr \n";

for my $row1 ( $xp->findnodes('//row1s/row1') ) {
    printf "Level --- row1 \"name\" gives: %s\n", $row1->getAttribute("name");

    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        printf "Level row2 \"type\" gives: %s\n", $row2->getAttribute("type");
        printf "Level row2 \"size\" gives: %s\n", $row2->getAttribute("size");

        $dbh->do(
            "INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)",
            undef,
            $row1->getAttribute("name"),
            $row1->getAttribute("category"),
            $row2->getAttribute("type"),
            $row2->getAttribute("size")
        ) or die "Error during execution: " . "$DBI::err -> $DBI::errstr (animal $DBI::state)\n";
    }
}

终端输出为:

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp

这是我的预期。但该表格包含以下条目:

name  category  type    size
fox   mammal    1         10
fox   mammal    2          8
horse mammal    3        100

蜜蜂和黄蜂错过了。任何人都可以帮我解决这个问题吗?我想知道为什么这会发生,因为终端的输出是好的。

感谢您的帮助。

以下是表格的代码:

CREATE TABLE test01.animal4 (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
, type     INTEGER DEFAULT NULL
, size     INTEGER DEFAULT NULL
);

这是hierarchy problem的后续问题。

2 个答案:

答案 0 :(得分:2)

您已经有解释和修复,但我建议进行以下更改

  • 您应该prepare INSERT INTO SQL语句,然后在循环中executedo有更大的开销

  • //descendant-or-self::node())XPath结构很昂贵,如果你不知道元素在文档中的位置,你应该保留它,这是非常罕见的。在这种情况下,row1元素位于/database/row1s/row1row2元素相对于row2s/row2

  • 如果要在带引号的字符串中使用引号字符,则使用不同的分隔符会更清晰。例如,"My name is \"$name\""qq{My name is "$name"}

  • 好得多

这是您的计划版本,可能有所帮助。

use strict;
use warnings;

use XML::XPath;
use DBI;

my $xp = XML::XPath->new( filename => 'animals4.xml' );

my $dbh = DBI->connect(
   'DBI:mysql:test', 'user', 'pw',
   { RaiseError => 1, PrintError => 0}
) or die "Fehler beim Verbidungsaufbau zum MariaDB-Server: $DBI::err -< $DBI::errstr\n";

my $insert_animal = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('/database/row1s/row1') ) {

   my $name     = $row1->getAttribute('name');
   my $category = $row1->getAttribute('category');

   printf qq{Level --- row1 "name" gives: $name\n};

   my @row2 = $xp->findnodes('row2s/row2', $row1);

   if ( @row2 ) {
      for my $row2 ( @row2 ) {

         my $type = $row2->getAttribute('type');
         my $size = $row2->getAttribute('size');

         print qq{Level row2 "type" gives: $type\n};
         print qq{Level row2 "size" gives: $size\n};

         $insert_animal->execute($name, $category, $type, $size);
      }
   }
   else {
      $insert_animal->execute($name, $category, undef, undef);
   }
}

<强>输出

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp

答案 1 :(得分:1)

从您的代码中,只有当您的第二个查询(对于$ row1下的节点)返回结果时,才会发生数据库写入:

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        $dbh->do("INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)"
        [...]  
        ) or die        ;   
    }
}

如果没有$ row2节点,则没有数据库写入。

如果您希望无论$ row2节点是否存在都要进行数据库写入,您需要将db写出来自for循环,即:

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # get name and category here
    my $name = $row1->getAttribute('name');
    my $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2'); ## creates a Nodeset object
    if ($row2set->size > 0) {
        ## we found nodes!!
        foreach my $row2 ($row2set->get_nodelist) {
           # get size and type here
           my $type = $row2->getAttribute('type');
           my $size = $row2->getAttribute('size');
           # write to db

        }
    } else {
        ## no row2 nodes found.
        ## write to db - just write the row1 values; type and size will be undefined.

    }
}

NodeSet文档:http://search.cpan.org/~msergeant/XML-XPath-1.13/XPath/NodeSet.pm

关于设置变量和范围的快速说明

范围指的是实体(变量,子例程,对象等)在Perl代码中可见和可访问的位置;设置实体的范围有助于封装它们,并防止数据或功能在程序的每个部分都可用。

使用代码结构(如子例程,循环,包,对象)设置范围 - 任何由花括号({}分隔的代码块。 Perl(以及许多其他语言)的标准做法是在进入块时增加缩进并在离开块时减少缩进;这样,您可以在阅读代码时非常轻松地确定范围。

使用my将变量(或函数,对象等)的范围设置为仅限于设置变量的代码块; e.g。

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # $row1 is available inside this code block

    my $row2set = $row1->find('row2s/row2');
    # $row2set is now available inside this code block

    if ($row2set->size > 0) {
        my $size = $row2set->size;
        # $size is now available inside this code block

        foreach my $row2 ($row2set->get_nodelist) {
            # $row2 is available inside this code block
            # we can also access $row1, $row2set, $size
        }

        # we can access $row1, $row2set, $size
        # $row2 is out of scope, i.e. we cannot access it

        say "The value of row2 is $row2";
        # Perl will complain 'Global symbol "$row2" requires explicit package name'
    }
    # we can access $row1 and $row2set
    # $size and $row2 are out of scope
}
# $row1, $row2set, $size, and $row2 are out of scope

回到您的代码,假设您决定设置变量$name$category$type$size来捕获您的数据并将其写入数据库。您必须确保正确设置变量的范围,否则它们将存储不适当的数据。例如:

# declare all our variables
my ($name, $cat, $type, $size);
for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # we can set $name and $cat from the data in row1:
    $name = $row1->getAttribute('name');
    $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # row2 gives us the type and size info
            $type = $row2->getAttribute('type');
            $size = $row2->getAttribute('size');
            # "say" prints a string and adds a "\n" to the end,
            # so it's very handy for debugging
            say "row2s found: name: $name; category: $cat; type: $type; size: $size";
        }
    } else {
        say "row2s empty: name: $name; category: $cat; type: $type; size: $size";
    }
}

这给了我们以下输出:

row2s found: name: fox; category: mammal; type: 1; size: 10
row2s found: name: fox; category: mammal; type: 2; size: 8
row2s found: name: horse; category: mammal; type: 3; size: 100
row2s empty: name: bee; category: insect; type: 3; size: 100
row2s empty: name: wasp; category: insect; type: 3; size: 100

这是因为$type$size的范围设置为整个代码块,并且在row1循环和内部row2循环的每次迭代之间保留值。蜜蜂和黄蜂没有大小和类型的值,因此使用前一种动物的值。

有许多不同的方法可以解决这个问题,但效率最高的可能是:

my $db_insert = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # for debugging
            say "row2s found: name: " . $row1->getAttribute('name') .
            "; category: " . $row1->getAttribute('category') .
            "; type: " . $row2->getAttribute('type') .
            "; size: " . $row2->getAttribute('size');

            $db_insert->execute( $row1->getAttribute('name'),
            $row1->getAttribute('category'),
            $row2->getAttribute('type'),
            $row2->getAttribute('size') );
        }
    } else {
        # for debugging
        say "row2s empty: name: " . $row1->getAttribute('name') .
        "; category: " . $row1->getAttribute('category') .
        "; type: NOT SET" .
        "; size: NOT SET";
        $db_insert->execute( $row1->getAttribute('name'),
        $row1->getAttribute('category'),
        undef,
        undef );
    }
}
相关问题