Question

我正在寻找基于XPath 2.0的XSLT / XQuery函数。我确信此功能必须已经存在，但我的关键字搜索不会产生答案。

设定一系列n个字符串，每个字符串的项目可以使用tokenize转换为任意长度的字符串序列。例如，给定输入（或参见下面的Michael Hor编辑以获得更简单的版本）：

<alignment>
  <text>really the man and a hand</text>
  <text>the hand and a man</text>
  <text>really a hand and the man</text>
  <text>hand man a the</text>
  <text>the man really is a hand</text>
  <text>man and a hand</text>
</alignment>

...通过for $i in //text return replace($i,'\s+',':')

绑定到变量$ array

$array=("really:the:man:and:a:hand",
        "the:hand:and:a:man",
        "really:a:hand:and:the:man",
        "hand:man:a:the",
        "the:man:really:is:a:hand",
        "man:and:a:hand")

我们希望创建一个基于XPath 2.0，exfn:comb-array($arg1 as xs:item+,$arg2 as xs:string)的自定义函数，当给定$array及其分隔符时，返回最小的组合数组。也就是说，n个序列字符串的新序列，其中插入了空值和分隔符，以便（1）当标记化时，每个新的序列字符串具有相同的长度（len），（2）该长度是可能的最小值，（2） 3）保留每个原始序列字符串的内部顺序，以及（4）对于每个序列字符串中的每个位置，该值为空或相同的字符串。也就是说，for $i in $newarray return distinct-values(subsequence(tokenize($i,':'),x,1))对于每个x而言为空或相同的字符串值，其中1＆lt; x＆lt; len + 1。

要使用上面的插图，我们需要输出（或参见下面的编辑）：

$arraynew=("really::::::the:man::::and:a:hand:",
"::::::the::hand:::and:a::man",
"really:::a:hand:and:the:man:::::::",
":hand:man:a:::the::::::::",
"::::::the:man::really:is::a:hand:",
":::::::man::::and:a:hand:")

这是一个更一般的问题的简化形式，关于如何对齐显示大量编辑的文本的多个版本，但是要进行对齐，以便保留每个版本的文档顺序，但是生成的联合文档（数组）版本）的长度最小。

由michael.hor257k编辑

我冒昧地将您的示例转换为更易于理解的形式：

INPUT：

A, B, C, D, E, F,
B, F, D, E, C,
A, E, F, D, B, C,
F, C, E, B,
B, C, A, G, E, F,
C, D, E, F

输出

A, -, -, -, -, -, B, C, -, -, -, D, E, F, -,
-, -, -, -, -, -, B, -, F, -, -, D, E, -, C,
A, -, -, E, F, D, B, C, -, -, -, -, -, -, -,
-, F, C, E, -, -, B, -, -, -, -, -, -, -, -,
-, -, -, -, -, -, B, C, -, A, G, -, E, F, -,
-, -, -, -, -, -, -, C, -, -, -, D, E, F, -

（我仍然没有得到它）。

Answer 1

我已经针对这个问题开发了基于XSLT的答案，现在可以在tan:collate-sequences()的TAN函数库的core component中找到。

XPath以对齐序列序列

1 个答案: