基于BR元素将P元素拆分为多个P元素

时间:2015-07-16 10:52:14

标签: xml xslt xslt-2.0

我试图通过BR元素将包含多个SPAN和BR的单个P元素拆分为单独的P元素。

以下是示例输入xml结构:

  <P>
     <SPAN CLASS="BYLINE">by john doe</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
        <BR/>
     </SPAN>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
     <SPAN CLASS="ITALIC">
        <BR/>ITALIC SUB-TITLE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.<BR/>
     </SPAN>
  </P>

我希望看到的输出xml是:

  <P>
    <SPAN CLASS="BYLINE">by john doe</SPAN>
  </P>
  <P>
    <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</SPAN>
  </P>
  <P>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
  </P>
  <P>
     <SPAN CLASS="ITALIC">ITALIC SUB-TITLE</SPAN>     
  </P>
  <P>
     <SPAN CLASS="TEXT">Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT"></SPAN>
  </P>    

这可能吗? 我试图使用xsl:key和分组,但无法使其正常工作。

非常感谢任何建议。感谢。

1 个答案:

答案 0 :(得分:0)

如果您使用的是XSLT 2.0,看起来您可以将xsl:for-each-groupgroup-ending-with结合使用

<xsl:for-each-group select="SPAN" group-ending-with="*[BR]">

然后,您可以使用current-group()函数获取您想要归入SPAN

的所有P个元素
<P>
    <xsl:apply-templates select="current-group()" />
</P>  

您还需要模板来停止BR代码,并输出仅包含SPAN代码的BR代码。

试试这个XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="P">
      <xsl:for-each-group select="SPAN" group-ending-with="*[BR]">
            <P>
                <xsl:apply-templates select="current-group()" />
            </P>           
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="SPAN[BR][not(normalize-space())]" />

    <xsl:template match="BR" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

这并不能完全满足您所需的输出,因为<SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>与以下范围相结合,而不是在其自己的P标记中,但我无法弄清楚为什么逻辑因为那是不同的。

有关在XSLT 2.0中使用xsl:for-each-group的更有趣方法,请参阅http://www.xml.com/pub/a/2003/11/05/tr.html