XML读取具有不同段的相同标记

时间:2016-07-18 10:01:04

标签: java xml xml-parsing

以下是xml文件

<maindata>
        <publication-reference>
          <document-id document-id-type="docdb">
            <country>US</country>
            <doc-number>9820394ASD</doc-number>
            <date>20111101</date>
          </document-id>
          <document-id document-id-type="docmain">
            <doc-number>9820394</doc-number>
            <date>20111101</date>
          </document-id>
        </publication-reference>
</maindata>

我想在type =“<doc-number>”下提取docmain代码值 下面是我的java代码,同时执行了它的提取9829394ASD而不是9820394

public static void main(String[] args) {
        String filePath ="D:/bs.xml";
        File xmlFile = new File(filePath);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder;
        try {
            dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("publication-reference");
            List<Biblio> docList = new ArrayList<Biblio>();
            for (int i = 0; i < nodeList.getLength(); i++) {
                docList.add(getdoc(nodeList.item(i)));
            }

        } catch (SAXException | ParserConfigurationException | IOException e1) {
            e1.printStackTrace();
        }
    }
    private static Biblio getdoc(Node node) {
           Biblio bib = new Biblio();
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element element = (Element) node;
            bib.setCountry(getTagValue("country",element));
            bib.setDocnumber(getTagValue("doc-number",element));
            bib.setDate(getTagValue("date",element));          
        }
        return bib;
    }

让我知道我们如何检查键入其docmain或doctype,仅在类型为docmain时才提取,否则应该保留元素

添加了getTagValue方法

private static String getTagValue(String tag, Element element) {
        NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
        Node node = (Node) nodeList.item(0);
        return node.getNodeValue();
    }

3 个答案:

答案 0 :(得分:1)

更改您的方法getdoc(),以便仅为&#39; docmain`类型创建Biblio个对象。

private static Biblio getdoc(Node node) {
  Biblio bib = null;
  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element element = (Element) node;
    String type = element.getAttribute("document-id-type");
    if(type != null && type.equals("docmain")) {
      bib = new Biblio();
      bib.setCountry(getTagValue("country",element));
      bib.setDocnumber(getTagValue("doc-number",element));
      bib.setDate(getTagValue("date",element));          
    }
  }
  return bib;
}

然后在main方法中,如果getdoc()结果不为null,则只应添加到列表中:

for (int i = 0; i < nodeList.getLength(); i++) {
  Biblio biblio = getdoc(nodeList.item(i));
  if(biblio != null) {
    docList.add(biblio);
  }
}

<强>更新 好的,这太可怕了,抱歉。你应该真正了解一下XPath。 我尝试使用XPath表达式重写它。

首先我们需要四个XPath表达式。一个用于提取包含document-id类型的所有docmain元素的节点列表。

XPath表达式是:/maindata/publication-reference/document-id[@document-id-type='docmain'](上下文中的整个XML文档)。

此处[]中的谓词确保仅提取类型为document-id的{​​{1}}元素。

然后对于docmain元素中的每个字段(以document-id元素作为上下文):

  • 国家/地区:document-id
  • docnumber:country
  • 日期:doc-number

我们使用静态初始化器:

date

然后我们重写方法private static XPathExpression xpathDocId; private static XPathExpression xpathCountry; private static XPathExpression xpathDocnumber; private static XPathExpression xpathDate; static { try { XPath xpath = XPathFactory.newInstance().newXPath(); // Context is the whole document. Find all document-id elements with type docmain xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']"); // Context is a document-id element. xpathCountry = xpath.compile("country"); xpathDocnumber = xpath.compile("doc-number"); xpathDate = xpath.compile("date"); } catch (XPathExpressionException e) { e.printStackTrace(); } } 。此方法现在获取getdoc元素作为输入,并使用XPath表达式从中创建document-id实例:

Biblio

然后在private static Biblio getdoc(Node element) throws XPathExpressionException { Biblio biblio = new Biblio(); biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING)); biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING)); biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING)); return biblio; } 方法中,使用XPath表达式仅提取所需的元素:

main()

答案 1 :(得分:1)

可以使用following使用DOM and XPath API XPath检索该值。

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(new File(...) );
    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()");
    String value = expr.evaluate(doc);

答案 2 :(得分:0)

感谢帮助,以下是代码

String Number = xPath.compile("//publication-reference//document-id[@document-id-type=\"docmain\"]/doc-number").evaluate(xmlDocument);
相关问题