Question

我有以下应该是XML的数据：

(sort-by-frequency (list(cons 'c 2)(cons 'b 3)(cons 'a 1)) '(1 2 3))

因此，基本上我有多个根元素（((A . 1) (C . 2) (B . 3))）...

重点是，我试图将这些数据转换为2个XML文档，其中1个用于有效节点，另一些用于无效节点。

有效节点：

<?xml version="1.0" encoding="UTF-8"?>
<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<ProductTTTTT>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</Product>

<Product>
    <id>1</id>
    <description>A new product</description>
    <price>123.45</price>
</ProductAAAAAA>

无效的节点：product和<Product> ... </Product>

然后，我在考虑如何使用JAVA（而非网络）实现此目标。

如果我没有记错，那么使用XSD对其进行验证将使整个文件无效，因此不是一种选择。
使用默认的JAXB解析器（unmarshaller）将导致上述情况，因为它在内部创建了我实体的XSD。
仅使用XPath（据我所知）将返回整个文件，但我没有找到获取GET！VALID之类的方法（仅用于解释...）
使用XQuery（也许吗？）..顺便问一下，如何将XQuery与JAXB一起使用？
由于XSL（T）使用XPath选择内容，因此会在XPath上导致相同的结果。

那么...我可以使用哪种方法来达到目标？（如果可能，请提供链接或代码）

Answer 1

首先，您会混淆有效且格式正确的内容。您说要查找无效的元素，但您的示例不仅无效，而且格式错误。这意味着除了向您抛出错误消息外，任何XML解析器都不会对它们执行任何操作。您不能使用JAXB或XPath，XQuery或XSLT或任何其他方法来处理非XML的内容。

您说“很遗憾，我无权访问发送此xml格式的系统”。我不确定为什么您将其称为XML格式：不是。我还不明白为什么您（以及StackOverflow上的许多其他人）准备花您的时间来挖掘这种垃圾，而不是告诉发件人将它们放在一起。如果为您提供了带有的沙拉，您会尝试将其挑出来，还是寄回去更换？您应该对错误数据采用零容忍方法；这是发件人将学习提高质量的唯一方法。

Answer 2

如果文件包含带有以“产品”开头的名称的开始和结束标记的行，则可以：

每行以<Product或</Product开头的行，请使用文件扫描仪将此文档分成多个部分
尝试使用XML API将提取的文本解析为XML。
- 如果成功，则将该对象添加到“良好”格式良好的XML文档列表中
  - 然后执行任何其他架构验证或有效性检查
- 如果引发解析错误，请抓住它，然后将该文本片段添加到需要清理或以其他方式处理的“不良”项目列表中

一个入门的例子：

package com.stackoverflow.questions.52012383;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringReader;

import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

public class FileSplitter {

    public static void parseFile(File file, String elementName) 
      throws ParserConfigurationException, IOException {

        List<Document> good = new ArrayList<>();
        List<String> bad = new ArrayList<>();

        String start-tag = "<" + elementName;
        String end-tag = "</" + elementName;
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder;
        StringBuffer buffer = new StringBuffer();
        String line;
        boolean append = false;

        try (Scanner scanner = new Scanner(file)) {
            while (scanner.hasNextLine()) {
                line = scanner.nextLine();

                if (line.startsWith(startTag)) {
                    append = true; //start accumulating content
                } else if (line.startsWith(endTag)) {
                    append = false;
                    buffer.append(line); 
                    //instead of the line above, you could hard-code the ending tag to compensate for bad data:
                    // buffer.append(endTag + ">");

                    try { // to parse as XML
                        builder = factory.newDocumentBuilder();
                        Document document = builder.parse(new InputSource(new StringReader(buffer.toString())));
                        good.add(document); // parsed successfully, add it to the good list

                        buffer.setLength(0); //reset the buffer to start a new XML doc

                    } catch (SAXException ex) {
                        bad.add(buffer.toString()); // something is wrong, not well-formed XML
                    }
                }

                if (append) { // accumulate content
                    buffer.append(line);
                }
            }
            System.out.println("Good items: " + good.size() + " Bad items: " + bad.size());
            //do stuff with the good/bad results...
        }
    }

    public static void main(String args[]) 
      throws ParserConfigurationException, IOException {
        File file = new File("/tmp/test.xml");
        parseFile(file, "Product");
    }

}

可以使用哪些方法从Java文件中返回有效和无效的XML数据？

2 个答案: