如何使用XQuery定义从多个XML文件加入数据的多个条件?

时间:2012-09-17 09:38:09

标签: xml parsing xquery

我有两个XML文件,其中包含有关文档的信息。我需要根据这些文件中的信息创建一个DOT图。

layout.xml

<layout>
<segmentation>
    <layout-unit id="lay-1.01" xref="u-1.01 u-1.02 u-1.03"/>
    <layout-unit id="lay-1.02" xref="u-1.04 u-1.05 u-1.06 u-1.07 u-1.08">
    <layout-unit id="lay-1.03" xref="u-1.09"/>
    <layout-unit id="lay-1.04" xref="u-1.10 u-1.11 u-1.12"/>
    <layout-unit id="lay-1.05" xref="u-1.13 u-1.14 u-1.15 u-1.16"/>
</segmentation>
</layout>

rhetoric.xml

<rhetoric>
<segmentation>
 <segment id="s-1.01" xref="u-1.01"/>
 <segment id="s-1.02" xref="u-1.02"/>
 <segment id="s-1.03" xref="u-1.03"/>
 <segment id="s-1.04" xref="u-1.04"/>
 <segment id="s-1.05" xref="u-1.05"/>
 <segment id="s-1.06" xref="u-1.06"/>
 <segment id="s-1.07" xref="u-1.07"/>
 <segment id="s-1.08" xref="u-1.08"/>
 <segment id="s-1.09" xref="u-1.09"/>
 <segment id="s-1.10" xref="u-1.10"/>
 <segment id="s-1.11" xref="u-1.11"/>
 <segment id="s-1.12" xref="u-1.12"/>
 <segment id="s-1.13" xref="u-1.13"/>
 <mini-segment id="s-1.14" xref="u-1.14"/>
 <mini-segment id="s-1.15" xref="u-1.15"/>
 <mini-segment id="s-1.16" xref="u-1.16"/>
</segmentation>
<rst-structure root="s-1.01">
    <span id="span-1.01" nucleus="s-1.01" satellites="span-1.02" relation="elaboration"><title xref="s-1.09"></title></span>
    <span id="span-1.02" nucleus="s-1.02" satellites="s-1.03" relation="elaboration"/>
    <span id="span-1.03" nucleus="s-1.01" satellites="span-1.04" relation="enablement"/>
    <span id="span-1.04" nucleus="s-1.04" satellites="span-1.05" relation="enablement"/>
    <multi-span id="span-1.05" nuclei="span-1.08 span-1.06" relation="sequence"/>
    <span id="span-1.06" nucleus="s-1.06" satellites="span-1.07" relation="elaboration"></span>
    <multi-span id="span-1.07" nuclei="s-1.07 s-1.08" relation="restatement"></multi-span>
    <span id="span-1.08" nucleus="s-1.05" satellites="s-1.10 span-1.09" relation="elaboration"/>
    <span id="span-1.09" nucleus="s-1.11" satellites="span-1.10" relation="nonvolitional-result"/>
    <span id="span-1.10" nucleus="s-1.12" satellites="span-1.11" relation="elaboration"/>
</rst-structure>
<mini-structure>
    <mini-span id="span-1.11" attribute="s-1.14 s-1.15 s-1.16" attribuend="s-1.13" relation="class-ascription"/>
</mini-structure>
</rhetoric>

为了创建DOT图,我有一个XQuery脚本,用于获取 rhetoric.xml 中的数据,将其转换为DOT,并根据 layout.xml将图表分类为子图

图表如下所示。

DOT graph

我使用 @xref 属性选择两个文件中的相关数据,如下所示:

declare function local:add-subgraphs($rhetoric, $layout) {
for $layout-unit-id in $layout/segmentation/layout-unit/@id
let $layout-unit-xrefs := tokenize($layout/segmentation/layout-unit[@id = $layout-unit-id]/@xref, " ")

let $rst-id := $rhetoric/segmentation/segment/@id
let $segment := $rhetoric/segmentation/segment[@xref = $layout-unit-xrefs and @id = $rst-id]/@id

然后我通过浏览修辞/ rst-structure 下的不同元素开始填充DOT图:

let $add-edges-nucleus := for $span-id in $rhetoric/rst-structure/span[@nucleus = $segment]/@id
let $nucleus := tokenize($rhetoric/rst-structure/span[@id = $span-id]/@nucleus, " ")
return concat('"', $nucleus, '" ', $arrow, ' "', $span-id, '"', ';', $newline)

如您所见, $ segment 变量用于定义哪些跨度属于某个子图。

此实例在 rhetoric.xml

中出现问题
<multi-span id="span-1.07" nuclei="s-1.07 s-1.08" relation="restatement"></multi-span>

在这种情况下,我不能使用$ segment变量来选择要包含在子图中的范围,因为它的结构与 span 元素的结构不同。

例如,考虑段s-1.07和s-1.08,它们应该包含在lay-1.02中,但是仍然在上图中的子图之外。

有关如何定义其他条件以处理多跨元素的想法,以便将它们放在正确的子图下吗?

1 个答案:

答案 0 :(得分:0)

如果我理解正确,你试图从与给定span-id匹配的span或multi-span元素中选择nucleus或nuclei属性。

以下表达式就是这样:

$rhetoric/rst-structure/(span|multi-span)[@id = $span-id]/(@nucleus|@nuclei)

|是XQuery联合运算符。鉴于您的XML,这只会匹配跨度多跨元素(同样适用于nucleus / nuclei属性)。

或者,您可以选择具有匹配span-id的任何元素:

$rhetoric/rst-structure/*[@id = $span-id]/(@nucleus|@nuclei)
相关问题