我想解析大型XML文件并将它们存储到数据库中(mysql) 像这样的XML: 文件XML~200MB 如何解析此XML文件? 如何获得子元素。它有2部分'vuln'和'易受攻击的配置' 谢谢!
<entry id="CVE-2015-0002">
<vuln:vulnerable-configuration id="http://www.nist.gov/">
<cpe-lang:logical-test operator="OR" negate="false">
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_7:-:sp1"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2008:r2:sp1"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8:-"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_8.1:-"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:-:gold"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt:-:gold"/>
<cpe-lang:fact-ref name="cpe:/o:microsoft:windows_rt_8.1:-"/>
</cpe-lang:logical-test>
</vuln:vulnerable-configuration>
<vuln:vulnerable-software-list>
<vuln:product>cpe:/o:microsoft:windows_server_2012:-:gold</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_rt:-:gold</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_7:-:sp1</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_rt_8.1:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_server_2012:r2::~~~x64~~</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_8:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_8.1:-</vuln:product>
<vuln:product>cpe:/o:microsoft:windows_server_2008:r2:sp1</vuln:product>
</vuln:vulnerable-software-list>
<vuln:cve-id>CVE-2015-0002</vuln:cve-id>
<vuln:published-datetime>2015-01-13T17:59:01.253-05:00</vuln:published-datetime>
<vuln:last-modified-datetime>2015-01-14T16:51:14.253-05:00</vuln:last-modified-datetime>
<vuln:cvss>
<cvss:base_metrics>
<cvss:score>7.2</cvss:score>
<cvss:access-vector>LOCAL</cvss:access-vector>
<cvss:access-complexity>LOW</cvss:access-complexity>
<cvss:authentication>NONE</cvss:authentication>
<cvss:confidentiality-impact>COMPLETE</cvss:confidentiality-impact>
<cvss:integrity-impact>COMPLETE</cvss:integrity-impact>
<cvss:availability-impact>COMPLETE</cvss:availability-impact>
<cvss:source>http://nvd.nist.gov</cvss:source>
<cvss:generated-on-datetime>2015-01-14T16:20:33.273-05:00</cvss:generated-on-datetime>
</cvss:base_metrics>
</vuln:cvss>
<vuln:cwe id="CWE-264"/>
<vuln:references xml:lang="en" reference_type="VENDOR_ADVISORY">
<vuln:source>MS</vuln:source>
<vuln:reference href="http://technet.microsoft.com/security/bulletin/MS15-001" xml:lang="en">MS15-001</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="https://code.google.com/p/google-security-research/issues/detail?id=118" xml:lang="en">https://code.google.com/p/google-security-research/issues/detail?id=118</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/" xml:lang="en">http://www.zdnet.com/article/google-discloses-unpatched-windows-vulnerability/</vuln:reference>
</vuln:references>
<vuln:references xml:lang="en" reference_type="UNKNOWN">
<vuln:source>MISC</vuln:source>
<vuln:reference href="http://twitter.com/sambowne/statuses/550384131683520512" xml:lang="en">http://twitter.com/sambowne/statuses/550384131683520512</vuln:reference>
</vuln:references>
<vuln:summary>The AhcVerifyAdminContext function in ahcache.sys in the Application Compatibility component in Microsoft Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8, Windows 8.1, Windows Server 2012 Gold and R2, and Windows RT Gold and 8.1 does not verify that an impersonation token is associated with an administrative account, which allows local users to gain privileges by running AppCompatCache.exe with a crafted DLL file, aka MSRC ID 20544 or "Microsoft Application Compatibility Infrastructure Elevation of Privilege Vulnerability."</vuln:summary>
</entry>
答案 0 :(得分:2)
部分答案。
首先看一下这个链接来回答你的大部分问题, How to import XML with nested nodes (parent/child relationships) into Access?
将XML导入Access,并使用文件转换XML,以便每个子表获取密钥 vuln:cve-id 以链接回主表条目
以下代码适用于某些子表但不是全部,如果有人可以指出为什么它不适用于所有子表,请执行此操作。 但是它确实为你提供了主表,其中包含 vuln:cve-id vuln:published-datetime vuln:last-modified-datetime vuln:summary 加上 cvss:base_metrics cvss:score cvss:access -vector cvss:access-complexity cvss:source cvss:generated-on-datetime 。
将以下代码放在名为transform.xslt的文件中,并在导入访问时使用它。你需要添加适当的XSL头文件,我无法在这篇文章中添加它们作为“你需要至少10个声望来发布2个以上的链接”: - (
<xsl:template match="/">
<dataroot>
<xsl:apply-templates select="@*|node()"/>
</dataroot>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="entry">
<xsl:apply-templates select="@*|node()"/>
</xsl:template>
<xsl:template match="cpe-lang:logical-test">
<cpe-lang:logical-test>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</cpe-lang:logical-test>
</xsl:template>
<xsl:template match="vuln:vulnerable-configuration">
<vuln:vulnerable-configuration>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:vulnerable-configuration>
</xsl:template>
<xsl:template match="vuln:vulnerable-software-list">
<vuln:vulnerable-software-list>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:vulnerable-software-list>
</xsl:template>
<xsl:template match="cvss:base_metrics">
<cvss:base_metrics>
<vuln:cve-id><xsl:value-of select="../../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</cvss:base_metrics>
</xsl:template>
<xsl:template match="vuln:references">
<vuln:references>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:references>
</xsl:template>
<xsl:template match="vuln:scanner">
<vuln:scanner>
<vuln:cve-id><xsl:value-of select="../vuln:cve-id"/></vuln:cve-id>
<xsl:apply-templates select="@*|node()"/>
</vuln:scanner>
</xsl:template>
答案 1 :(得分:0)
我知道这有点旧,但我简单地研究了类似于你问题的东西。这是一些相当丑陋的代码(在几个小时内完成),但我认为除了导出到数据库之外,它还能满足您的要求。它使用巨型XML文件(CVE),解析它们以获取特定的键/值对,并将它们与网络扫描进行比较。
https://github.com/bhealy/netScan
import xml.etree.ElementTree as ET
tree = ET.parse(XMLfile)
root = tree.getroot()
stuffYouCareAbout = root[0][1][2][3].text
我能够使用etree解析XML文件,这使得查找特定项目变得更加容易。显然,样本正在查看一个非常具体的索引,但它应该是一个很好的起点(如果这篇文章还不太晚!)