用php从xml中提取信息

时间:2015-06-23 15:05:26

标签: php mysql xml parsing

我正在尝试编写一个php脚本,它将从xml文件中提取信息并将其放入数据库中。我创建了一个L.A.M.P.在CENTOS 6.6上叠加。下面的脚本的工作原理是它识别XML文件中的输入总数,但没有提取信息,因为每个部分都有子标记。是否有我可以添加到我的代码中的东西,以打印输入到数据库的每个标记中的所有子标记及其文本。

#!/usr/bin/php
<?php
// sample XML data
$data = <<<XML
<entry type="CVE" name="CVE-2003-0002" seq="2003-0002"
published="2003-01-17" modified="2015-04-14" severity="Medium"
CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)" />
<desc>
<descript source="cve">Multiple ethernet Network Interface 'Card' (NIC)   device drivers do not pad frames with null bytes, which allows remote attackers to obtain information from previous packets or kernel memory by using malformed packets, as demonstrated by Etherleak.
</descript>
</desc>
<loss_types>
<conf/>
</loss_types>
<vuln_types>
<design/>
</vuln_types>
<range>
<network/>
</range>
<refs>
<ref source="CERT-VN" url="http://www.kb.cert.org/vuls/id/412115" adv="1">VU#412115</ref>
<ref source="BUGTRAQ" url="http://www.securityfocus.com/archive/1/archive/1/535181/100/0/threaded">20150402 NEW : VMSA-2015-0003 VMware product updates address critical information disclosure issue in JRE</ref>
<ref source="REDHAT" url="http://www.redhat.com/support/errata/RHSA-2003-025.html">RHSA-2003:025</ref>
<ref source="CONFIRM" url="http://www.oracle.com/technetwork/topics/security/cpujan2015-1972971.html">http://www.oracle.com/technetwork/topics/security/cpujan2015-1972971.html</ref>
<ref source="MISC" url="http://www.atstake.com/research/advisories/2003/atstake_etherleak_report.pdf">http://www.atstake.com/research/advisories/2003/atstake_etherleak_report.pdf</ref>
<ref source="ATSTAKE" url="http://www.atstake.com/research/advisories/2003/a010603-1.txt" adv="1">A010603-1</ref><ref source="FULLDISC" url="http://seclists.org/fulldisclosure/2015/Apr/5">20150402 NEW : VMSA-2015-0003 VMware product updates address critical information disclosure issue in JRE</ref>
<ref source="MISC" url="http://packetstormsecurity.com/files/131271/VMware-Security-Advisory-2015-0003.html">http://packetstormsecurity.com/files/131271/VMware-Security-Advisory-2015-0003.html</ref><ref source="BUGTRAQ" url="http://marc.theaimsgroup.com/?l=bugtraq&m=104222046632243&w=2" adv="1">20030110 More information regarding Etherleak</ref>
<ref source="VULNWATCH" url="http://archives.neohapsis.com/archives/vulnwatch/2003-q1/0016.html">20030110 More information regarding Etherleak</ref>
<ref source="BUGTRAQ" url="http://www.securityfocus.com/archive/1/archive/1/307564/30/26270/threaded">20030117 Re: More information regarding Etherleak</ref>
<ref source="BUGTRAQ" url="http://www.securityfocus.com/archive/1/archive/1/305335/30/26420/threaded">20030106 Etherleak: Ethernet frame padding information leakage (A010603-1)</ref>
<ref source="REDHAT" url="http://www.redhat.com/support/errata/RHSA-2003-088.html">RHSA-2003:088</ref><ref source="OSVDB" url="http://www.osvdb.org/9962">9962</ref>
<ref source="OVAL" url="http://oval.mitre.org/repository/data/getDef?id=oval:org.mitre.oval:def:2665" sig="1">oval:org.mitre.oval:def:2665</ref>
</refs>
<vuln_soft>
<prod name="freebsd" vendor="freebsd">
<vers num="4.2"/>
<vers num="4.3"/>
<vers num="4.4"/>
<vers num="4.5"/>
<vers num="4.6"/>
<vers num="4.7"/>
</prod>
<prod name="linux_kernel" vendor="linux">
<vers num="2.4.1"/>
<vers num="2.4.10"/>
<vers num="2.4.11"/>
<vers num="2.4.12"/>
<vers num="2.4.13"/>
<vers num="2.4.14"/>
<vers num="2.4.15"/>
<vers num="2.4.16"/>
<vers num="2.4.17"/>
<vers num="2.4.18"/>
<vers num="2.4.19"/>
<vers num="2.4.2"/>
<vers num="2.4.20"/>
<vers num="2.4.3"/>
<vers num="2.4.4"/>
<vers num="2.4.5"/>
<vers num="2.4.6"/>
<vers num="2.4.7"/>
<vers num="2.4.8"/>
<vers num="2.4.9"/>
</prod>
<prod name="windows_2000" vendor="microsoft">
<vers num="" edition=":advanced_server"/> 
<vers num="" edition=":server"/>
<vers num="" edition=":professional"/>
<vers num="" edition=":datacenter_server"/>
<vers num="" edition="sp1:datacenter_server"/>
<vers num="" edition="sp1:advanced_server"/>
<vers num="" edition="sp1:professional"/>
<vers num="" edition="sp1:server"/>
<vers num="" edition="sp2:datacenter_server"/>
<vers num="" edition="sp2:advanced_server"/>
<vers num="" edition="sp2:professional"/>
<vers num="" edition="sp2:server"/>
</prod>
<prod name="windows_2000_terminal_services" vendor="microsoft">
<vers num="" edition="sp1"/>
<vers num="" edition="sp2"/>
</prod>
<prod name="netbsd" vendor="netbsd">
<vers num="1.5"/>
<vers num="1.5.1"/>
<vers num="1.5.2"/>
<vers num="1.5.3"/>
<vers num="1.6"/>
</prod>
</vuln_soft>
</entry>
XML;

// gather XML data

// database connection settings
$host = 'localhost';
$database = 'cve';
$user = 'admin';
$pass = 'admin';
$table = 'vulnerabilities';

try {
// connect to database
$dbh = new PDO('mysql:host=' . $host . ';dbname=' . $database, $user, $pass);

// prepare xml and iterator
$xml = new SimpleXMLIterator($data);
$itr = new RecursiveIteratorIterator($xml);
// loop through XML data
foreach ($itr as $key => $value) {

    // prepare an insert statement
    $statement = $dbh->prepare("INSERT INTO $table (identifier,seq,published,modified,severity,cvss_verison,cvss_score,cvss_base_score,cvss_impact_subscore,cvss_exploit_subscore,cvs_vector,information,loss_types,vuln_types,impact_area,refs,vuln_soft) VALUES (':name',':seq',':published',':modified',':severity',':CVSS_verison',':CVSS_score',':CVSS_base_score',':CVSS_impact_subscore',':CVSS_exploit_subscore',':CVSS_vector',':desc',':loss_types',':vuln_types',':range',':ref',':vuln_soft')");

    // bind your XML data to named parameters for the insert statement
    $statement->bindParam(':name', $value->attributes()->identifier);
    $statement->bindParam(':seq', $value->attributes()->seq);
    $statement->bindParam(':published', $value->attributes()->published);
    $statement->bindParam(':modified', $value->attributes()->modified);
    $statement->bindParam(':severity', $value->attributes()->severity);
    $statement->bindParam(':CVSS_version', $value->attributes()->cvss_verison);
    $statement->bindParam(':CVSS_score', $value->attributes()->cvss_score);
    $statement->bindParam(':CVSS_base_score', $value->attributes()->cvss_base_score);
    $statement->bindParam(':CVSS_impact_subscore', $value->attributes()->cvss_impact_subscore);
    $statement->bindParam(':CVSS_exploit_subscore', $value->attributes()->cvss_exploit_subscore);
    $statement->bindParam(':CVS_vector', $value->attributes()->cvs_vector);
    $statement->bindParam(':desc',$value->attributes()->information);
    $statement->bindParam(':loss_types',$value->attributes()->loss_types);
    $statement->bindParam(':vuln_types',$value->attributes()->vuln_types);
    $statement->bindParam(':range',$value->attributes()->impact_area);
    $statement->bindParam(':refs',$value->attributes()->refs);
    $statement->bindParam(':vuln_soft',$value->attributes()->vuln_soft);


    // insert XML data into database table
    $statement->execute();
}

$dbh = null;
} catch (PDOException $e) {
print "There was an error: " . $e->getMessage() . "\n";
die();
}

?>

我需要从entry标签中收集所有数据并将其放入数据库。 包含标记信息的xml代码示例:

<entry type="CVE" name="CVE-2003-0001" seq="2003-0001"
 published="2003-01-17" modified="2015-04-14" severity="Medium"
 CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
 CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
 CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)">

然后我需要通过使用entry标记记录标签的子标签数据和文本来收集入口标签中的所有数据。 带有子标记的xml代码示例:

<refs>
<ref source="reference information">Reference information</ref></refs>
<ref source="reference information">Reference information</ref></refs>
<ref source="reference information">Reference information</ref></refs>
<ref source="reference information">Reference information</ref></refs>
</refs>
</entry>

上面详述的当前脚本会返回以下警告和致命错误:     PHP警告:SimpleXMLElement :: __ construct():实体:第6行:解析器错误:第112行/home/ant244/Documents/extract.php文档末尾的额外内容

PHP Warning:  SimpleXMLElement::__construct(): <desc> in /home/ant244/Documents/extract.php on line 112

PHP Warning:  SimpleXMLElement::__construct(): ^ in /home/ant244/Documents/extract.php on line 112

PHP Fatal error:  Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /home/ant244/Documents/extract.php:112

Stack trace:
#0 /home/ant244/Documents/extract.php(112):  SimpleXMLElement->__construct('<entry type="CV...')

#1 {main}
  thrown in /home/ant244/Documents/extract.php on line 112

2 个答案:

答案 0 :(得分:0)

背景

如果我正确理解你的问题,一种方法可能涉及循环遍历XML entry数据,同时在准备好的SQL语句中使用找到的数据作为命名参数。准备好的语句有助于保持数据库输入相当干净(有关详细信息,请参阅“{strong>如何使我的数据库查询从SQL注入安全?”部分PHP tag wiki page)< / em>的

这种方法可能类似于下面的示例代码。下面的代码显示了预处理语句如何用于数据库工作,以及如何在$value->attributes()->name循环中使用foreach格式访问XML数据(其中) name匹配XML条目中的各个属性)

代码示例1 (预备语句)

<?php

// sample XML data
$data = <<<XML
<root>
<entry type="CVE" name="CVE-2003-0001" seq="2003-0001"
published="2003-01-17" modified="2015-04-14" severity="Medium"
CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)" />
<entry type="CVE" name="CVE-2003-0002" seq="2003-0002"
published="2003-01-17" modified="2015-04-14" severity="Medium"
CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)" />
</root>
XML;

// gather XML data
$xml = simplexml_load_string($data);

// database connection settings
$host = 'localhost';
$database = 'your_database';
$user = 'your_username';
$pass = 'your_password';
$table = 'your_database_table';

try {
    // connect to database
    $dbh = new PDO('mysql:host=' . $host . ';dbname=' . $database, $user, $pass);

    // loop through XML data
    foreach ($xml->entry as $key => $value) {

        // prepare an insert statement
        $statement = $dbh->prepare("INSERT INTO $table (name, seq) VALUES (:name, :seq)");

        // bind your XML data to named parameters for the insert statement
        $statement->bindParam(':name', $value->attributes()->name);
        $statement->bindParam(':seq', $value->attributes()->seq);

        // insert XML data into database table
        $statement->execute();
    }

    $dbh = null;
} catch (PDOException $e) {
    print "There was an error: " . $e->getMessage();
    die();
}

?>

然而,当涉及使用嵌套标签时,使用iterator可能是个好主意。对于您的示例XML (以下代码示例进行了简化), using an iterator可能如下所示:

代码示例2 (迭代器)

<?php

// sample XML data
$data = <<<XML
<root>
<entry>
<refs>
<ref source="reference_information_1">Reference information 1</ref>
<ref source="reference_information_2">Reference information 2</ref>
</refs>
</entry>
</root>
XML;

// prepare XML data and iterator
$xml = new SimpleXMLIterator($data);
$itr = new RecursiveIteratorIterator($xml);

// iterate over each relevant tag
foreach ($itr as $key => $value) {
  echo $key . ": " . $value . "\n";
  echo "source attribute: " . $value->attributes()->source . "\n";
}

?>

此代码生成以下输出:

ref: Reference information 1
source attribute: reference_information_1
ref: Reference information 2
source attribute: reference_information_2

代码示例3 (预处理语句+迭代器)

<?php

// sample XML data
$data = <<<XML
<root>
<entry type="CVE" name="CVE-2003-0001" seq="2003-0001"
published="2003-01-17" modified="2015-04-14" severity="Medium"
CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)" />
<entry type="CVE" name="CVE-2003-0002" seq="2003-0002"
published="2003-01-17" modified="2015-04-14" severity="Medium"
CVSS_version="2.0 incomplete approximation" CVSS_score="5.0"
CVSS_base_score="5.0" CVSS_impact_subscore="2.9"
CVSS_exploit_subscore="10.0" CVSS_vector="(AV:N/AC:L/Au:N/C:P/I:N/A:N)" />
</root>
XML;

// database connection settings
$host = 'localhost';
$database = 'your_database';
$user = 'your_username';
$pass = 'your_password';
$table = 'your_database_table';

try {
    // connect to database
    $dbh = new PDO('mysql:host=' . $host . ';dbname=' . $database, $user, $pass);

    // prepare XML data and iterator
    $xml = new SimpleXMLIterator($data);
    $itr = new RecursiveIteratorIterator($xml);

    // iterate over each relevant tag
    foreach ($itr as $key => $value) {

        // prepare an insert statement
        $statement = $dbh->prepare("INSERT INTO $table (name, seq) VALUES (:name, :seq)");

        // bind your XML data to named parameters for the insert statement
        $statement->bindParam(':name', $value->attributes()->name);
        $statement->bindParam(':seq', $value->attributes()->seq);

        // insert XML data into database table
        $statement->execute();
    }

    $dbh = null;
} catch (PDOException $e) {
    print "There was an error: " . $e->getMessage();
    die();
}

?>

结论

准备好的声明迭代器都可以提供安全便捷的方式来处理XML和数据库相关的应用程序。对于您的程序,将这两个代码示例的想法结合起来可能会有所帮助(使用{strong>第二代码示例中的iterators作为// loop through XML data 第三代码示例中显示的第一代码示例部分

答案 1 :(得分:0)

if( get_class( $itrXml ) == 'SimpleXMLIterator' ) { # when the thing is a SimpleXMLIterator
  print $itrXml->__toString( );                     # output its string value

仅当 $itrXml 是具有零个子节点的叶节点时才适用。

如果 $itrXml 是带有一些子节点的分支节点,则会失败,并显示“错误:调用数组上的成员函数 __toString()”