针对模式的XML(.xsd)提要验证

时间:2013-07-23 19:58:27

标签: python xml python-2.7 xsd xml-validation

我有一个XML文件,我有一个XML架构。我想根据该模式验证该文件,并检查它是否符合该模式。我正在使用python,但如果在python中没有这样有用的库,我会对任何语言开放。

这里最好的选择是什么?我担心我能以多快的速度运行它。

3 个答案:

答案 0 :(得分:25)

绝对是lxml

使用预定义的架构定义XMLParser,加载文件fromstring()并捕获任何XML架构错误:

from lxml import etree

def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'r') as f:
            etree.fromstring(f.read(), xmlparser) 
        return True
    except etree.XMLSchemaError:
        return False

schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
    schema_root = etree.XML(f.read())

schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)

filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
    if validate(xmlparser, filename):
        print("%s validates" % filename)
    else:
        print("%s doesn't validate" % filename)

关于编码的注意事项

如果架构文件包含带编码的xml标记(例如<?xml version="1.0" encoding="UTF-8"?>),则上面的代码将生成以下错误:

Traceback (most recent call last):
  File "<input>", line 2, in <module>
    schema_root = etree.XML(f.read())
  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

A solution是以字节模式打开文件:open(..., 'rb')

[...]
def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]

答案 1 :(得分:2)

python代码段很好,但另一种方法是使用xmllint:

xmllint -schema sample.xsd --noout sample.xml

答案 2 :(得分:0)

 {:ok,
 %HTTPoison.Response{
   body: "",
   headers: [
     {"Date", "Tue, 22 Jun 2021 11:42:20 GMT"},
     {"Transfer-Encoding", "chunked"},
     {"Connection", "keep-alive"},
     {"Cache-Control", "max-age=3600"},
     {"Expires", "Tue, 22 Jun 2021 12:42:20 GMT"},
     {"Location",
      "https://yts.mx/api/v2/list_movies.json?query_term=tt11296058"},
     {"cf-request-id", "0ad5205cb800004da508b04000000001"},
     {"Expect-CT",
      "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""},
     {"Report-To",
      "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v2?s=O80%2B5KfZ6d3G3Fz0NBGlep%2BetzQAvaUDIvVW09DUB2QMtJpd1XxupK621LhGR8EqiOsOY%2B55BdaHAljyLCEumHyb0rHSqk526jMQ5NxuLUi%2FVdbX\"}],\"group\":\"cf-nel\",\"max_age\":604800}"},
     {"NEL", "{\"report_to\":\"cf-nel\",\"max_age\":604800}"},
     {"Server", "cloudflare"},
     {"CF-RAY", "663536745c654da5-BOM"},
     {"alt-svc",
      "h3-27=\":443\"; ma=86400, h3-28=\":443\"; ma=86400, h3-29=\":443\"; ma=86400, h3=\":443\"; ma=86400"}
   ],
   request: %HTTPoison.Request{
     body: "",
     headers: [],
     method: :get,
     options: [],
     params: %{},
     url: "https://yts.lt/api/v2/list_movies.json?query_term=tt11296058"
   },
   request_url: "https://yts.lt/api/v2/list_movies.json?query_term=tt11296058",
   status_code: 301
 }}