Question

<xml>
<Office prop1="prop1" prop2="prop2">
    <Version major="1" minor="0"/>
    <Label>MyObjectA</Label>
    <Active>No</Active>
</Office>
<Vehicle prop="prop">
    <Wheels>4</Wheels>
    <Brand>Honda</Brand>
    <Bought>No</Bought>
</Vehicle>
</xml>

我的XML采用这种格式。我正在使用SAX解析器来解析此文件，因为xml文件的大小可能很大。

我应该遵循什么模式来解析文件。

通常我一直在遵循这种方法：

//PseudoCode
if(start){
    if(type Office)
    {
       create an instance of type Office and populate the attributes of Office in the Office class using a call back
    }
    if(type Vehicle)
    {
       create an instance of type Vehicle and populate the attributes of Vehicle in the Vehicle class using a call back
     }
}

if(end){
     // do cleaning up
}

这种方法通常会使我的解析函数包含起始和结束标记。还有其他更好的方法可以遵循。

Answer 1

我对这种方法有很好的经验：

创建查找表以将节点名称映射到处理程序函数。您很可能需要为每个节点名称维护两个处理程序，一个用于开头，一个用于结束标记。
维护一组父节点。
从查找表中调用处理程序。
每个处理函数都可以执行其任务而无需进一步检查。但是如果需要，每个处理程序也可以通过查看父节点堆栈来确定当前上下文。如果节点层次结构中的不同位置具有相同名称的节点，则这一点很重要。

一些伪Java代码：

public class MyHandler extends DefaultHandler {

private Map<String, MyCallbackAdapter> startLookup = new HashMap<String, MyCallbackAdapter>();
private Map<String, MyCallbackAdapter> endLookup = new HashMap<String, MyCallbackAdapter>();
private Stack<String> nodeStack = new Stack<String>();

public MyHandler() {
   // Initialize the lookup tables
   startLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeStart() } 
    });

   endLookup.put("Office", new MyCallbackAdapter() { 
      public void execute() { myOfficeEnd() } 
    });
}

public void startElement(String namespaceURI, String localName,
        String qName, Attributes atts) {
  nodeStack.push(localName);

  MyCallbackAdapter callback = startLookup.get(localName);
  if (callback != null)
    callback.execute();
}

public void endElement(String namespaceURI, String localName, String qName)

  MyCallbackAdapter callback = endLookup.get(localName);
  if (callback != null)
    callback.execute();

  nodeStack.pop();
}

private void myOfficeStart() {
  // Do the stuff necessary for the "Office" start tag
}

private void myOfficeEnd() {
  // Do the stuff necessary for the "Office" end tag
}

//...

}

一般建议： 根据您的要求，您可能需要更多上下文信息，例如上一个节点名称或当前节点为空。如果您发现自己添加了越来越多的上下文信息，您可能会考虑切换到完整的fletched DOM解析器，除非运行时速度比开发速度更重要。

Answer 2

如果你想坚持使用明确的SAX方法，DR's answer是有道理的。我过去使用这种方法取得了成功。

但是，您可能需要查看Commons Digester，它允许您指定要为XML文档的子树创建/填充的对象。这是一种非常简单的方法，可以在不使用SAX模型的情况下从XML构建对象层次结构。

有关详细信息，请参阅this ONJava文章。

Answer 3

您可以创建一个从类型到解析操作的查找表，然后您只需要索引到查找表中以找到适当的解析操作。

Answer 4

您需要lexical analyer，Interpreter Pattern是编写词法分析器的理想模式。

我应该使用什么模式来使用SAX解析器？

4 个答案: