具有嵌套和可重复节点的XML到CSV

时间:2013-01-30 13:29:48

标签: xml xslt xslt-1.0

我正在使用Dublin Core编码的元数据文件,我想将其转换为CSV。我想到达下面的输出

identifier1|||identifier2|||identifier3|||identifier4,datestamp1|||datestamp2|||2010-04-27T01:10:31Z,setspec1,title1|||title2,subject1|||subject2,baseURL|||xxxxx|||xxxxx

请注意,可重复元素由三个管道符号(|||)分隔,而元素由逗号(,)分隔。

我已经设法到达下面的样式表,但是,我正在努力解决以下问题

(1)如何定义通用模板以使用逗号分隔节点?

<xsl:template match="GENERIC MATCH">
  <xsl:apply-templates select="current()" />
  <xsl:if test="position() = last()">,</xsl:if>
</xsl:template>

以下面的Input File为例,我基本上希望GENERIC MATCH能够动态处理level 2个节点(标题,元数据和约),并用逗号分隔结果。

(2)如何确定元素是否是最后一个子节点,以便我可以有条件地包含逗号?

<xsl:output method="text" omit-xml-declaration="yes"/>

<xsl:template match="/">
  <xsl:apply-templates select="record" />
</xsl:template>
<xsl:template match="record">
  <xsl:apply-templates select="//metadata/oai_dc:dc/dc:title|//metadata/oai_dc:dc/dc:subject" />
  <xsl:if test="not(metadata/oai_dc:dc/node()/position()=last())">#####</xsl:if>
</xsl:template>

<xsl:template match="dc:title">
  <xsl:value-of select="." />
  <xsl:if test="not(position() = last())">||</xsl:if>
</xsl:template>

<xsl:template match="dc:subject">
  <xsl:value-of select="." />
  <xsl:if test="not(position() = last())">||</xsl:if>
</xsl:template>

输入文件

<?xml version="1.0"?>
<record>
  <header>
    <identifier>identifier1</identifier>
    <datestamp>datastamp1</datestamp>
    <setSpec>setspec1</setSpec>
  </header>
  <metadata>
    <oai_dc:dc>
      <dc:title>title1</dc:title>
      <dc:title>title2</dc:title>
      <dc:creator>creator1</dc:creator>
      <dc:subject>subject1</dc:subject>
      <dc:subject>subject2</dc:subject>
    </oai_dc:dc>
  </metadata>
  <about>
    <provenance>
      <originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
        <baseURL>baseURL1</baseURL>
        <identifier>identifier3</identifier>
        <datestamp>datestamp2</datestamp>
        <metadataNamespace>xxxxx</metadataNamespace>
        <originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
          <baseURL>xxxxx</baseURL>
          <identifier>identifier4</identifier>
          <datestamp>2010-04-27T01:10:31Z</datestamp>
          <metadataNamespace>xxxxx</metadataNamespace>
        </originDescription>
      </originDescription>
    </provenance>
  </about>
</record>

我正在使用xslt 1.0xsltproc合作。

1 个答案:

答案 0 :(得分:1)

这是怎么回事:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" indent="yes" omit-xml-declaration="yes"/>
  <!-- A key on all leaf nodes -->
  <!-- *[not(*)] matches any element that is a leaf node
       i.e. it has no child elements. Here, the elements' names are being
       used as the key value. -->
  <xsl:key name="kNodeType" match="*[not(*)]" use="local-name()"/>

  <xsl:template match="/">
    <!-- Use Muenchian grouping to apply the "group" template to the first of
         each leaf node with a distinct name. -->
    <xsl:apply-templates
      select="//*[not(*)][generate-id() = 
                          generate-id(key('kNodeType', local-name())[1])]"
      mode="group" />
  </xsl:template>

  <!-- This template will be used to match only the first item in each group,
       due to the grouping expression used in the previous template. -->
  <xsl:template match="*" mode="group">
    <!-- Skip the comma for the first group, output it for all others -->
    <xsl:if test="position() > 1">,</xsl:if>
    <!-- Apply the "item" template to all items in the same group as this element
         (i.e. those with the same name) -->
    <xsl:apply-templates select="key('kNodeType', local-name())" mode="item" />
  </xsl:template>

  <xsl:template match="*" mode="item">
    <!-- Skip the delimiter for the first item in each group;
         output it for all others -->
    <xsl:if test="position() > 1">|||</xsl:if>
    <xsl:value-of select="."/>
  </xsl:template>
</xsl:stylesheet>

在样本输入上运行时,会产生:

  
    

标记位||| |||标记位identifier4,datastamp1 ||| ||| datestamp2 2010-04-27T01:10:3​​1Z,setspec1,TITLE1 ||| TITLE2,creator1,subject1 ||| subject2,baseURL1 || | XXXXX,为XXXXX XXXXX |||