将树结构解析为关系式数据存储

时间:2011-08-24 04:29:18

标签: java algorithm parsing serialization

是否有人能够帮助我实现这一目标,或者至少是用于此目的的算法。

我要做的是将层次结构/树结构文件解析为关系存储。我将在下面进一步解释,并举例说明。

这是一个示例源文件,只是用于此问题的简单/非现实示例。

<title text=“title1">
    <comment id=“comment1">
        <data> this is part of comment one</data>
        <data> this is some more of comment one</data>
    </comment>
    <comment id=“comment2”>
        <data> this is part of comment two</data>
        <data> this is some more of comment two</data>
        <data> this is even some more of comment two</data>
    </comment>
</title>

因此,此处需要注意的主要事项是每个评论的<comment>个数和<data>个元素的数量可能是任意的。因此,鉴于上述情况,我希望转变为类似的东西:

title     |   comment     |      data
------------------------------------------------------------------------
title1       comment1            this is some part of comment one
title1       comment1            this is some more of comment one
title1       comment2            this is part of comment two       
title1       comment2            this is some more of comment two
title1       comment2            this is even some more of comment two 

为了实现这一点,假设我可以使用可在源文件上计算的xpath表达式,以下列方式指定关系模式。

attribute1: title   =  /title/@title
attribute2: comment =  /title/comment/@id
attribute3: data    =  /title/comment/data/text()

建议的数据结构:

  • ResultSet是List<Map<String,String>>(其中:每个地图代表一行)
  • 架构是Map<String,String>(其中:我们映射属性名称 - &gt;路径表达式)
  • 源文件,部分DOM Document

1 个答案:

答案 0 :(得分:0)

我不确定您是否在询问如何实现XML解析器本身,或者如果给出XML的解析树,如何将其展平为层次结构。我猜你现在正在考虑后者(有很多优秀的XML解析器,我怀疑这是瓶颈),所以我会在这里回答。如果您真的对XML解析细节感兴趣,请告诉我,我可以更新答案。

我相信你想要考虑的方式是在树上递归下降。这个想法如下:您的命名系统包含树中所有节点的连接,后跟您自己的名称。鉴于此,您可以使用以下内容在树上运行递归DFS:

FlattenXML(XMLDocument x) {
    for each top-level XML node t:
        RecFlattenTree(t, "");
}

RecFlattenTree(Tree t, String prefix) {
    if t is a leaf with data d:
       update the master table by adding (prefix, d) to the list of entries
    else
       for each child c of t, whose name is x:
           RecFlattenTree(c, prefix + "/" + x)
}

例如,如果您要通过顶部的XML文档对此进行跟踪,则可能是这样的:

RecFlattenTree(title1, "/title1")
    RecFlattenTree(comment1, "/title1/comment1")
        RecFlattenTree(data node 1 , "/title1/comment1")
             Add /title1/comment1/data, value = "this is some part of comment one"
        RecFlattenTree(data node 2, "/title1/comment1")
             Add /title1/comment2/data, value = "this is some more of comment one"
    RecFlattenTree(comment2, "/title1/comment2")
        RecFlattenTree(data node 1 , "/title1/comment2")
             Add /title1/comment2/data, value = "this is part of comment two"
        RecFlattenTree(data node 2, "/title1/comment2")
             Add /title1/comment2/data, value = "this is more of comment two"
        RecFlattenTree(data node 3, "/title1/comment2")
             Add /title1/comment2/data, value = "this is even more of comment two"

最终生成列表

/title1/comment1/data, value = "this is some part of comment one"
/title1/comment1/data, value = "this is some more of comment one"
/title1/comment1/data, value = "this is part of comment two"
/title1/comment1/data, value = "this is more of comment two"
/title1/comment1/data, value = "this is even more of comment two"

这正是你想要的。

希望这有帮助!如果我误解了你的问题,请告诉我!