通过Java中的XSLT进行XML碎化

时间:2011-12-17 22:35:05

标签: java xml xslt jdom flatten

我需要转换具有

形式的嵌套(分层)结构的大型XML文件
<Root>
   Flat XML
   Hierarchical XML (multiple blocks, some repetitive)
   Flat XML
</Root>

形成一个更扁平的(“粉碎的”)形式,每个重复的嵌套块有1个块。

数据有许多不同的标签和层次结构变体(特别是在分层XML之前和之后的碎片XML的标签数量),因此理想情况下不应该对标签和属性名称或层次级别做出假设。 / p>

仅4个级别的层次结构的顶级视图看起来像

<Level 1>
   ...
   <Level 2>
      ...
      <Level 3>
        ...
        <Level 4>A</Level 4>
        <Level 4>B</Level 4>
        ...
      </Level 3>
      ...
   </Level 2>
   ...
</Level 1>

然后所需的输出

<Level 1>
  ...
  <Level 2>
    ...
      <Level 3>
        ...
        <Level 4>A</Level 4>
        ...
      </Level 3>
    ...
  </Level 2>
  ...
</Level 1>

<Level 1>
  ...
  <Level 2>
    ...
      <Level 3>
        ...
        <Level 4>B</Level 4>
        ...
      </Level 3>
    ...
  </Level 2>
  ...
</Level 1>

也就是说,如果在每个级别i都有Li个不同的组件,则会生成Product(Li)个不同的组件(仅上面2个,因为唯一的区别因素是Level 4,所以L1*L2*L3*L4 = 2)。

从我所看到的,XSLT可能是要走的路,但任何其他解决方案(例如,StAX甚至JDOM)都可以。

使用虚构信息的更详细的例子是

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        <Job title = "Senior Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        <Job title = "Senior Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
    </Employment>
  </EmploymentHistory>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>2</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/05/1999</StartDate>
          <Months>25</Months>
        </Job>
        <Job title = "Junior Developer">
          <StartDate>01/07/2001</StartDate>
          <Months>3</Months>
        </Job>
      </JobDetails>
    </Employment>
  </EmploymentHistory>
  <Available>true</Available>
  <Experience unit="years">6</Experience>
</Employee>

上述数据应该被分解为5个块(即,每个不同的<Job>块一个),每个块将使所有其他标签保持相同,并且只有一个<Job>元素。因此,鉴于上例中的5个不同的<Job>块,转换后的(“碎片”)XML将是

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Senior Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/05/1999</StartDate>
          <Months>25</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="UK">
      <Comment>List of previous jobs in the UK</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Junior Developer">
          <StartDate>01/07/2001</StartDate>
          <Months>3</Months>
        </Job>
      </JobDetails>
      <Available>true</Available>
     <Experience unit="years">6</Experience>
    </Employment>
  </EmploymentHistory>
</Employee>

2 个答案:

答案 0 :(得分:4)

以下是按要求提供的通用解决方案

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pLeafNodes" select="//Level-4"/>

 <xsl:template match="/">
  <t>
    <xsl:call-template name="StructRepro"/>
  </t>
 </xsl:template>

 <xsl:template name="StructRepro">
   <xsl:param name="pLeaves" select="$pLeafNodes"/>

   <xsl:for-each select="$pLeaves">
     <xsl:apply-templates mode="build" select="/*">
      <xsl:with-param name="pChild" select="."/>
      <xsl:with-param name="pLeaves" select="$pLeaves"/>
     </xsl:apply-templates>
   </xsl:for-each>
 </xsl:template>

  <xsl:template mode="build" match="node()|@*">
      <xsl:param name="pChild"/>
      <xsl:param name="pLeaves"/>

     <xsl:copy>
       <xsl:apply-templates mode="build" select="@*"/>

       <xsl:variable name="vLeafChild" select=
         "*[count(.|$pChild) = count($pChild)]"/>

       <xsl:choose>
        <xsl:when test="$vLeafChild">
         <xsl:apply-templates mode="build"
             select="$vLeafChild
                    |
                      node()[not(count(.|$pLeaves) = count($pLeaves))]">
             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
         <xsl:apply-templates mode="build" select=
         "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                or
                 .//*[count(.|$pChild) = count($pChild)]
                ]
         ">

             <xsl:with-param name="pChild" select="$pChild"/>
             <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
        </xsl:otherwise>
       </xsl:choose>
     </xsl:copy>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

应用于提供的简化(和通用)XML文档

<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
        ...
        <Level-4>A</Level-4>
        <Level-4>B</Level-4>
        ...
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>

产生了想要的正确结果

<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
         <Level-4>A</Level-4>
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>
<Level-1>
   ...
   <Level-2>
      ...
      <Level-3>
         <Level-4>B</Level-4>
      </Level-3>
      ...
   </Level-2>
   ...
</Level-1>

现在,如果我们更改

 <xsl:param name="pLeafNodes" select="//Level-4"/>

 <xsl:param name="pLeafNodes" select="//Job"/>

并将转换应用于Employee XML文档

<Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
        <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
                <Job title = "Senior Developer">
                    <StartDate>01/10/2001</StartDate>
                    <Months>38</Months>
                </Job>
                <Job title = "Senior Developer">
                    <StartDate>01/12/2004</StartDate>
                    <Months>6</Months>
                </Job>
                <Job title = "Senior Developer">
                    <StartDate>01/06/2005</StartDate>
                    <Months>10</Months>
                </Job>
            </JobDetails>
        </Employment>
    </EmploymentHistory>
    <EmploymentHistory>
        <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
                <Job title = "Junior Developer">
                    <StartDate>01/05/1999</StartDate>
                    <Months>25</Months>
                </Job>
                <Job title = "Junior Developer">
                    <StartDate>01/07/2001</StartDate>
                    <Months>3</Months>
                </Job>
            </JobDetails>
        </Employment>
    </EmploymentHistory>
    <Available>true</Available>
    <Experience unit="years">6</Experience>
</Employee>

我们再次得到想要的,正确的结果

<t>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/10/2001</StartDate>
                  <Months>38</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/12/2004</StartDate>
                  <Months>6</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
               <Job title="Senior Developer">
                  <StartDate>01/06/2005</StartDate>
                  <Months>10</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
               <Job title="Junior Developer">
                  <StartDate>01/05/1999</StartDate>
                  <Months>25</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
   <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
         <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
               <Job title="Junior Developer">
                  <StartDate>01/07/2001</StartDate>
                  <Months>3</Months>
               </Job>
            </JobDetails>
         </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
   </Employee>
</t>

解释:处理在命名模板(StructRepro)中完成,并由名为pLeafNodes的单个外部参数控制,该参数必须包含所有节点的节点集“向上结构”将在结果中复制。

答案 1 :(得分:3)

给出以下XML:

<?xml version="1.0" encoding="utf-8" ?>
<Employee name="A Name">
  <Address>123 A Street</Address>
  <Age>28</Age>
  <EmploymentHistory>
    <Employment country="US">
      <Comment>List of previous jobs in the US</Comment>
      <Jobs>3</Jobs>
      <JobDetails>
        <Job title = "Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        <Job title = "Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        <Job title = "Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
      </JobDetails>
      </Employment>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title = "Developer">
            <StartDate>01/05/1999</StartDate>
            <Months>25</Months>
          </Job>
          <Job title = "Developer">
            <StartDate>01/07/2001</StartDate>
            <Months>3</Months>
          </Job>
        </JobDetails>
        </Employment>
  </EmploymentHistory>
  <Available>true</Available>
  <Experience unit="years">6</Experience>
</Employee>

以下XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
      <Output>
        <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />
      </Output>
    </xsl:template>

  <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
    <Employee>
      <xsl:attribute name="name">
        <xsl:value-of select="ancestor::Employee/@name"/>
      </xsl:attribute>
      <Address>
        <xsl:value-of select="ancestor::Employee/Address"/>
      </Address>
      <Age>
        <xsl:value-of select="ancestor::Employee/Age"/>
      </Age>
      <EmploymentHistory>
        <Employment>
          <xsl:attribute name="country">
            <xsl:value-of select="ancestor::Employment/@country"/>
          </xsl:attribute>
          <Comment>
            <xsl:value-of select="ancestor::Employment/Comment"/>
          </Comment>
          <Jobs>
            <xsl:value-of select="ancestor::Employment/Jobs"/>
          </Jobs>
          <JobDetails>
            <xsl:copy-of select="."/>
          </JobDetails>
          <Available>
            <xsl:value-of select="ancestor::Employee/Available"/>
          </Available>
          <Experience>
            <xsl:attribute name="unit">
              <xsl:value-of select="ancestor::Employee/Experience/@unit"/>
            </xsl:attribute>
            <xsl:value-of select="ancestor::Employee/Experience"/>
          </Experience>
        </Employment>
      </EmploymentHistory>
    </Employee>

  </xsl:template>


</xsl:stylesheet>

提供以下输出:

<?xml version="1.0" encoding="utf-8"?>
<Output>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/10/2001</StartDate>
          <Months>38</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/12/2004</StartDate>
          <Months>6</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="US">
        <Comment>List of previous jobs in the US</Comment>
        <Jobs>3</Jobs>
        <JobDetails>
          <Job title="Developer">
          <StartDate>01/06/2005</StartDate>
          <Months>10</Months>
        </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title="Developer">
            <StartDate>01/05/1999</StartDate>
            <Months>25</Months>
          </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
  <Employee name="A Name">
    <Address>123 A Street</Address>
    <Age>28</Age>
    <EmploymentHistory>
      <Employment country="UK">
        <Comment>List of previous jobs in the UK</Comment>
        <Jobs>2</Jobs>
        <JobDetails>
          <Job title="Developer">
            <StartDate>01/07/2001</StartDate>
            <Months>3</Months>
          </Job>
        </JobDetails>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
      </Employment>
    </EmploymentHistory>
  </Employee>
</Output>

请注意,我添加了一个输出根元素,以确保文档格式正确。

这是你想要的吗?

您也可以使用xsl:copy来复制更高级别的元素,但我需要多考虑一下这个元素。使用上面的xslt,您可以获得更多控制权,但您还必须重新定义元素...