XSLT在转换

时间:2017-06-09 16:09:19

标签: xml xslt xslt-1.0 grouping

我有以下XML:

<?xml version="1.0" encoding="utf-8"?>
<NewDataSet>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>16/01/1988</DateOfBirth>
        <FirstName>Fred</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United Kingdom</PlaceOfResidence>
        <RowNumber>1</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>01/01/1960</DateOfBirth>
        <FirstName>Harold</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United Kingdom</PlaceOfResidence>
        <RowNumber>2</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <Active>true</Active>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <DateOfBirth>05/05/1955</DateOfBirth>
        <FirstName>Mary</FirstName>
        <Notes>some notes</Notes>
        <PlaceOfResidence>United States</PlaceOfResidence>
        <RowNumber>3</RowNumber>
        <TableName>PersonDetails</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Property</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>1</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Motor</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>2</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
    <GUID>
        <ContractName>Contract Name</ContractName>
        <ContractNumber>Auto</ContractNumber>
        <CoverType>Liability</CoverType>
        <DateAdded>01/06/2017</DateAdded>
        <Notes>some notes</Notes>
        <RowNumber>3</RowNumber>
        <TableName>Covers</TableName>
    </GUID>
</NewDataSet>

我需要将其转换为以下内容:

<data>
    <ContractName>Contract Name</ContractName>
    <ContractNumber>Auto</ContractNumber>
    <Table>
        <TableRow RowNumber="1" TableName="PersonDetails">
            <FirstName>Fred</FirstName>
            <PlaceOfResidence>United Kingdom</PlaceOfResidence>
            <DateOfBirth>16/01/1988</DateOfBirth>
            <Active>true</Active>
        </TableRow>
        <TableRow RowNumber="2" TableName="PersonDetails">
            <FirstName>Harold</FirstName>
            <PlaceOfResidence>United Kingdom</PlaceOfResidence>
            <DateOfBirth>01/01/1960</DateOfBirth>
            <Active>true</Active>
        </TableRow>
        <TableRow RowNumber="3" TableName="PersonDetails">
            <FirstName>Mary</FirstName>
            <PlaceOfResidence>United States</PlaceOfResidence>
            <DateOfBirth>05/05/1955</DateOfBirth>
            <Active>true</Active>
        </TableRow>
    </Table>
    <Table>
        <TableRow RowNumber="1" TableName="Covers">
            <CoverType>Property</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
        <TableRow RowNumber="2" TableName="Covers">
            <CoverType>Motor</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
        <TableRow RowNumber="3" TableName="Covers">
            <CoverType>Liability</CoverType>
            <DateAdded>01/06/2017</DateAdded>
        </TableRow>
    </Table>
    <Notes>some notes</Notes>
</data>

我只能使用XSLT 1.0。

到目前为止,我有:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-16"/>
<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*[(*)]">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="/">
<data>
    <xsl:apply-templates select="@* | node()"/>
</data>
</xsl:template>
</xsl:stylesheet>

删除<NewDataSet><GUID>代码并替换为<data>

但是我不确定如何生成2个表分组,并且还自动*识别重复值:ContractName,Contract Number和Notes。

*其他重复值可能会在稍后出现。

任何帮助或指示将不胜感激。

1 个答案:

答案 0 :(得分:0)

这里面临的部分挑战是,您有GUID个元素的平面列表,您需要按TableName值进行分组。这似乎是Muenchian分组的一个非常经典的选择,这是一种在Jenny Tennison的网站上详细描述的技术:http://www.jenitennison.com/xslt/grouping/muenchian.html

以下是一些XSL代码,它产生的输出几乎与您所需的XML格式相同 - 唯一的区别在于<TableRow>中元素的顺序。然而,这种转变有许多不明确的方面。我在代码评论中指出了这些问题。

<xsl:output method="xml" version="1.0" encoding="utf-16" indent="yes"/>

<!-- Avoids newline whitespace where source elements are excluded from output -->
<xsl:strip-space elements="*"/>

<!-- The input XML appears to be a flat-ish list of `GUID` elements,
    each of which represents a single row of data in any of various tables. 
    In reorganizing this data, we want to group by `GUID/TableName` values.
    Muenchian grouping is probably the best way to do this in XSLT 1.0:
    see more at http://www.jenitennison.com/xslt/grouping/muenchian.html.
    This requires a key, so before we try to process the `GUID`s, we set 
    up the key. -->
<xsl:key name="table" match="GUID" use="TableName"/>

<!-- Begin at the beginning: the root and topmost element. -->
<xsl:template match="/NewDataSet">
    <data>
        <!-- I see in your desired output XML that you've put
            `ContractName` and `ContractNumber` just under the
            top-level `data` element.  This appears to assume
            that ALL of these have the same value for ALL of
            the individual `GUID` recordsets.
            In your input XML, these two are elements under `GUID`,
            so these data fields are included in the individual data
            records.  NOTE: If there is *any* chance that the values
            in these fields might differ between records, these
            should be kept within the table rows, and *not* moved
            to the same level as the output `Table` elements. -->
        <!-- This just naively copies these two elements from the first 
            `GUID` that has them.
            Again, if these values have any chance of differing between 
            `GUID`s, this whole approach is flawed. -->
        <xsl:copy-of select="GUID[ContractName][1]/ContractName"/>
        <xsl:copy-of select="GUID[ContractNumber][1]/ContractNumber"/>

        <!-- We want to process `GUID`s after grouping by `TableName` values.
            This `for-each` is part of the Muenchian grouping technique.  See
            Jenny Tennison's page (linked above) for a detailed explanation. -->
        <xsl:for-each select="GUID[count(. | key('table', TableName)[1]) = 1]">
            <!-- If you wanted to sort alphabetically by TableName, you'd use:
                <xsl:sort select="TableName" /> -->
            <Table>
                <!-- Now, within each `table`, we want to process all those
                    `GUID`s with this same corresponding `TableName`. -->
                <xsl:for-each select="key('table', TableName)">
                    <!-- We select "this", since we want to process the matching
                        `GUID`, not just its children. -->
                    <xsl:apply-templates select="."/>
                </xsl:for-each>
            </Table>
        </xsl:for-each>

        <!-- This just copies the `Notes` element from the first `GUID` that has a 
            `Notes` child.
            Similar to `ContactName` and `ContactNumber`, this naively assumes that
            all `Notes` elements have identical content. This approach is flawed if
            there is *any* possibility of different values. -->
        <xsl:copy-of select="GUID[Notes][1]/Notes"/>
    </data>

</xsl:template>

<!-- List up the elements we don't want to copy verbatim into each table row -->
<xsl:variable name="nocopy">
    <item>ContractName</item>
    <item>ContractNumber</item>
    <item>Notes</item>
    <item>RowNumber</item>
    <item>TableName</item>
</xsl:variable>

<xsl:template match="GUID">
    <TableRow RowNumber="{RowNumber}" TableName="{TableName}">
        <!-- Copy over child data, but _only_ if it's not in `$nocopy` -->
        <xsl:copy-of select="*[not(name() = $nocopy/item)]"/>
    </TableRow>
</xsl:template>

更新

重新阅读帖子中的文字(而不仅仅是你的代码:)),我看到你在询问如何识别重复的元素。但是,您所需的XML输出似乎已假设所有ContractName结构中的ContractNumberNotesGUID元素必须相同。

这令人困惑。您想要的输出已经假定您的问题的答案。

您是否要求,&#34;如何识别所有GUID结构共有的GUID子元素,并创建单个顶级副本这些,同时从输出GUID结构中删除它们?&#34;

更新2

在XSL中很容易确定给定元素是否存在于一组XML结构中的任何位置。

确定给定元素是否存在于一组XML结构的每个中并不容易。然而,虽然丑陋,但它是可能的。 :)

用以下内容替换上面的XSL。

<xsl:output method="xml" version="1.0" encoding="utf-16" indent="yes"/>

<!-- Avoids newline whitespace where source elements are excluded from output -->
<xsl:strip-space elements="*"/>

<!-- The input XML appears to be a flat-ish list of `GUID` elements,
each of which represents a single row of data in any of various tables. 
In reorganizing this data, we want to group by `GUID/TableName` values.
Muenchian grouping is probably the best way to do this in XSLT 1.0:
see more at http://www.jenitennison.com/xslt/grouping/muenchian.html.
This requires a key, so before we try to process the `GUID`s, we set 
up the key. -->
<xsl:key name="table" match="GUID" use="TableName"/>

$kids变量是我们确定哪些GUID子项对所有GUID结构都通用的关键部分。在XSL 1.0中可能有更优雅和有效的方法;在小型数据集上运行此操作需要0.8秒Oxygen XML(使用Saxon-HE 9.6.0.7处理器)。

<!-- Build list of unique GUID children that appear in all GUID structures -->
<xsl:variable name="kids">
    <xsl:for-each select="/NewDataSet/GUID/*">
        <xsl:variable name="this" select="."/>
        <!-- Intermediate variable used to collect results of whether the
            given child is in each GUID -->
        <xsl:variable name="in_all">
            <xsl:for-each select="/NewDataSet/GUID/*">
                <xsl:if test="name($this) = name(.) and $this = .">
                    <result><xsl:value-of select="true()"/></result>
                </xsl:if>
            </xsl:for-each>
        </xsl:variable>
        <!-- If we have the same number of `result`s as we have number of GUIDs,
                then output the first of each such child (there are dupes otherwise). -->
        <xsl:if test="count($in_all/result) = count(/NewDataSet/GUID) and 
         not(.=preceding::*)">
            <xsl:copy-of select="$this"/>
        </xsl:if>
    </xsl:for-each>
</xsl:variable>

这个部分大致相同,除了关于$kids的位。

<!-- Begin at the beginning: the root and topmost element. -->
<xsl:template match="/NewDataSet">
    <data>
        <!-- We'll put common elements at the top of the `data` structure. -->
        <xsl:copy-of select="$kids"/>

        <!-- We want to process `GUID`s after grouping by `TableName` values.
        This `for-each` is part of the Muenchian grouping technique.  See
        Jenny Tennison's page (linked above) for a detailed explanation. -->
        <xsl:for-each select="GUID[count(. | key('table', TableName)[1]) = 1]">
            <!-- If you wanted to sort alphabetically by TableName, you'd use:
            <xsl:sort select="TableName" /> -->
            <Table>
                <!-- Now, within each `table`, we want to process all those
                `GUID`s with this same corresponding `TableName`. -->
                <xsl:for-each select="key('table', TableName)">
                    <!-- We select "this", since we want to process the matching
                    `GUID`, not just it's children. -->
                    <xsl:apply-templates select="."/>
                </xsl:for-each>
            </Table>
        </xsl:for-each>
    </data>

</xsl:template>

更新$nocopy以包含$kids标识的元素名称。

<!-- List up the elements we don't want to copy verbatim into each table row -->
<xsl:variable name="nocopy">
    <item>RowNumber</item>
    <item>TableName</item>
    <!-- Copy in the bits from $kids -->
    <xsl:for-each select="$kids/*">
        <item><xsl:value-of select="name(.)"/></item>
    </xsl:for-each>
</xsl:variable>

<xsl:template match="GUID">
    <TableRow RowNumber="{RowNumber}" TableName="{TableName}">
        <!-- Copy over child data, but _only_ if it's not in `$nocopy` -->
        <xsl:copy-of select="*[not(name() = $nocopy/item)]"/>
    </TableRow>
</xsl:template>

现在生成的输出功能与所需的输出XML相同。唯一的区别是元素的顺序 - TableRow子项的顺序不同,Notes位于data的顶部,ContractName和{{1} }而不是ContractNumber的底部。

关于输出XML数据格式的注释

将表的名称作为每个表行的属性包含在内似乎有点奇怪。将它作为data元素本身的属性更有意义。

同样,在每一行上都有Table属性似乎是多余的。只需查看其父RowNumber中每个position()的{​​{1}}即可收集此信息。

那就是说,你知道你的要求。这只是让事情正常运转的问题。 :)