展平嵌套在文本节点中的子元素

时间:2015-01-03 17:02:55

标签: xml xslt xslt-2.0

这里有许多扁平化问题,但没有一个涉及这种复杂程度。

我有一个类似于:

的xml文档
<document>
<div class='target-one'>
    maybe some text node, maybe not...1
    <randomElement>
        maybe some text node, maybe not...2
    </randomElement>

    <div class='target-one'>
        <randomElement>
            maybe some text node, maybe not...3
        </randomElement>
    </div>
    maybe some text node, maybe not...4
    <randomElement>
        maybe some text node, maybe not...5
    </randomElement>

    <div class='target-two'>
        maybe some text node, maybe not...6
        <randomElement>
            maybe some text node, maybe not...7
        </randomElement>
    </div>
    maybe some text node, maybe not...8
    <randomElement>
        maybe some text node, maybe not...9
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...10
    <randomElement>
        maybe some text node, maybe not...11
    </randomElement>

    <div class='target-one'>
        <randomElement>
            maybe some text node, maybe not...12
        </randomElement>
    </div>
    maybe some text node, maybe not...13
    <randomElement>
        maybe some text node, maybe not...14
    </randomElement>

    <div class='target-two'>
        maybe some text node, maybe not...15
        <randomElement>
            maybe some text node, maybe not...16
        </randomElement>
    </div>
    maybe some text node, maybe not...17
    <randomElement>
        maybe some text node, maybe not...18
    </randomElement>
</div>

</document>

因此,有一个目标元素列表,可以按任何顺序嵌套。我想通过添加更多的父元素来嵌套randomElement和节点,同时使目标子节点成为目标兄弟节点,从而在嵌套时展平它们。我的意思是输出应该如下:

<document>
<div class='target-one'>
    maybe some text node, maybe not...1
    <randomElement>
        maybe some text node, maybe not...2
    </randomElement>
</div>
<div class='target-one'>
    <randomElement>
        maybe some text node, maybe not...3
    </randomElement>
</div>
<div class='target-one'>
    maybe some text node, maybe not...4
    <randomElement>
        maybe some text node, maybe not...5
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...6
    <randomElement>
        maybe some text node, maybe not...7
    </randomElement>
</div>
<div class='target-one'>
    maybe some text node, maybe not...8
    <randomElement>
        maybe some text node, maybe not...9
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...10
    <randomElement>
        maybe some text node, maybe not...11
    </randomElement>
</div>
<div class='target-one'>
    <randomElement>
        maybe some text node, maybe not...12
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...13
    <randomElement>
        maybe some text node, maybe not...14
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...15
    <randomElement>
        maybe some text node, maybe not...16
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...17
    <randomElement>
        maybe some text node, maybe not...18
    </randomElement>
</div>

</document>

所以我结束了更多的父div,但所有文本和其他节点都在正确的位置。请注意,randomElement可能是一个不是目标类的div ......

这是为了重新格式化在线图书馆中的分页电子书,所以在我们实际遇到问题div之前可能会有大量的元素。因此,我们需要一些方法来选择问题子div之间的所有元素和文本节点作为一个组,因为如果它们都包含在它们自己的div中,那就没有好处 - 我们将最终将每个p,em或span作为一个组合它自己的页面。

与此同时,大多数家长的孩子都没有问题。只要解决方案通过它们,我就可以用另一个运行来清理任何空的div,但我确实需要这个至少在一个基本级别上工作,文本也没有子元素。

这是我在StackOverflow上的第一个问题,因为我只是没有获得为此所需的递归。

谢谢!

基于用户52889的答案进行编辑。这从来没有成功,但为了便于阅读,我将其留在这里:

我可以在撒克逊人中解雇的XSL:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="2.0">
<xsl:output method="html"
        indent="yes"
        encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="/"> 
    <xsl:apply-templates />  
</xsl:template>
<xsl:template match="div[matches(@class,'target-one|target-two','i')]">
    <xsl:for-each select="node()">
        <xsl:choose>
            <xsl:when test="self::*[matches(@class,'target-one|target-two','i')]">
                <xsl:apply-templates select="."/>
            </xsl:when>
            <xsl:when test="preceding-sibling::node()[0][not(self::*[matches(@class,'target-one|target-two','i')])]">
                <!-- do nothing, it will be handled by the next case -->
            </xsl:when>
            <xsl:otherwise>
                <!--
      create a copy of the element matched by the template, with its attrs
      add to it the current node and all nodes which follow it, up to the next SIGNIFICANT node
      or, put another way, all following siblings which either
      a) do not have a preceding signficant node, or
      b) whose nearest preceding singificant node is the same as the nearest preceding significant node of the current node, i.e. its following sibling node is the current node.
    -->
                <xsl:element name="{../name()}">
                    <xsl:apply-templates select="../@*"/>
                    <xsl:apply-templates select="following-sibling::node()[
          not(preceding-sibling::*[matches(@class,'target-one|target-two','i')])
          or 
          count(preceding-sibling::*[matches(@class,'target-one|target-two','i')][0]/following-sibling::node()[0] | current()) = 1
        ]" />
                </xsl:element>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

此文件的当前输出包含子项和重复项:

<document>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...2

    </randomElement>
    <div class="target-one"></div>
    maybe some text node, maybe not...4

    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one">
    <div class="target-one"></div>
    maybe some text node, maybe not...4

    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one">
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...7

    </randomElement>
</div>
<div class="target-two"></div>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...11

    </randomElement>
    <div class="target-one"></div>
    maybe some text node, maybe not...13

    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <div class="target-one"></div>
    maybe some text node, maybe not...13

    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...16

    </randomElement>
</div>
<div class="target-two"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two"></div>
</document>

3 个答案:

答案 0 :(得分:2)

尝试将其视为分组问题我想出了

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:param name="prefix" select="'target-'"/>

<xsl:output indent="yes"/>

<xsl:template match="document">
  <xsl:copy>
    <xsl:for-each-group select="descendant::text()[normalize-space()]"
      group-adjacent="generate-id(ancestor::div[starts-with(@class, $prefix)][1])">
      <xsl:apply-templates select="ancestor::div[starts-with(@class, $prefix)][1]" mode="g">
        <xsl:with-param name="group" select="current-group()"/>
      </xsl:apply-templates>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

<xsl:template match="*" mode="g">
  <xsl:param name="group"/>
  <xsl:if test=". intersect $group/ancestor::*">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:apply-templates select="node()" mode="g">
        <xsl:with-param name="group" select="$group"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:if>
</xsl:template>

<xsl:template match="text()" mode="g">
  <xsl:param name="group"/>
  <xsl:if test=". intersect $group">
    <xsl:copy/>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>

这基本上将最近的祖先div的非空白文本节点后代与您正在寻找的class组合在一起,然后使用所有分组的文本节点重新创建包含在祖先中的子树。

答案 1 :(得分:1)

很难理解你的例子中的规则是什么,只是一个例子。以下样式表将产生所需的结果 - 也许这就是您正在寻找的结果。如果没有,请编辑您的问题并解释所请求转换背后的逻辑

XSLT 2.0(或1.0)

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/document">
    <document>
        <xsl:for-each select="//randomElement">
            <div class='{../@class}'>
                <xsl:copy-of select=". | preceding-sibling::text()[1]"/>
            </div>
        </xsl:for-each>
    </document>
</xsl:template>

</xsl:stylesheet>

答案 2 :(得分:0)

听起来像是你想要的东西,其中SIGNIFICANT是一个描述所有这些的表达式,只有那些你想成为新列表项的元素(例如div[substring(@class,1,6)='target'])...

<xsl:template match="SIGNIFICANT">
  <xsl:for-each select="node()">
    <xsl:choose>
      <xsl:when test="self::SIGNIFICANT">
        <xsl:apply-templates select="."/>
      </xsl:when>
      <xsl:when test="preceding-sibling::node()[0][not(self::SIGNIFICANT)]">
        <!-- do nothing, it will be handled by the next case -->
      </xsl:when>
      <xsl:otherwise>
        <!--
          create a copy of the element matched by the template, with its attrs
          add to it the current node and all nodes which follow it, up to the next SIGNIFICANT node
          or, put another way, all following siblings which either
          a) do not have a preceding signficant node, or
          b) whose nearest preceding singificant node is the same as the nearest preceding significant node of the current node, i.e. its following sibling node is the current node.
        -->
        <xsl:element name="../name()">
          <xsl:apply-templates select="../@*"/>
          <xsl:apply-templates select="following-sibling::node()[
              not(preceding-sibling::SIGNIFICANT)
              or 
              count(preceding-sibling::SIGNIFICANT[0]/following-sibling::node()[0] | current()) = 1
            ]">
        </xsl:element>
      </xsl:otherwise>
  </xsl:for-each>
</xsl:template>

注意:这意味着将完全删除没有子节点的顶级div。如果你不想要这种行为,你可以在选择/何时进行简单的包装。

另请注意:对于非常长的列表,可能会有一种更高效的递归方式。