在XSL转换期间基于字符计数序列化XML

时间:2012-05-19 06:49:45

标签: xslt

我有一个XML文档(A.xml),它正在转换为另一个XML文档(B.xml),它只是A.xml的复制品,具有唯一的{{1}被添加到属于@id的每个元素。这一部分已经完成。

现在我想实现一个机制来跟踪B.xml(临时树内)中每个文本节点的character count并基于B.xml,该机制将能够分割并在一个或多个部分中序列化maximum character count

源XML文档(B.xml):

A.xml

XSL文件

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <!--
    Rules for splitting:
    1. «head/text()» is common for all splits.
    2. split files can have 600 characters max each.
    3. «title» elements could not be the last element of the any result document.
    -->
    <head><!-- 8 characters -->Kinesics</head>
    <section>
        <para><!-- 37 characters -->From Wikipedia, the free encyclopedia</para>
        <para><!-- 204 characters [space normalized]-->Kinesics is the interpretation of body
            language such as facial expressions and gestures — or, more formally, non-verbal
            behavior related to movement, either of any part of the body or the body as a
            whole. </para>
        <section>
            <title><!-- 19 characters -->Birdwhistell's work</title>
            <para><!-- 432 characters [space normalized]-->The term was first used (in 1952) by Ray
                Birdwhistell, an anthropologist who wished to study how people communicate through
                posture, gesture, stance, and movement. Part of Birdwhistell's work involved making
                film of people in social situations and analyzing them to show different levels of
                communication not clearly seen otherwise. The study was joined by several other
                anthropologists, including Margaret Mead and Gregory Bateson.</para>
            <para><!-- 453 characters [space normalized]--> Drawing heavily on descriptive
                linguistics, Birdwhistell argued that all movements of the body have meaning (i.e.
                are not accidental), and that these non-verbal forms of language (or paralanguage)
                have a grammar that can be analyzed in similar terms to spoken language. Thus, a
                "kineme" is "similar to a phoneme because it consists of a group of movements which
                are not identical, but which may be used interchangeably without affecting social
                meaning".</para>
        </section>
        <section>
            <title><!-- 19 characters -->Modern applications</title>
            <para><!-- 390 characters [space normalized]-->Kinesics are an important part of
                non-verbal communication behavior. The movement of the body, or separate parts,
                conveys many specific meanings and the interpretations may be culture bound. As many
                movements are carried out at a subconscious or at least a low-awareness level,
                kinesic movements carry a significant risk of being misinterpreted in an
                intercultural communications situation.</para>
        </section>
    </section>
</root>

我的输入XML包含<?xml version="1.0" encoding="UTF-8"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:output method="xml" encoding="UTF-8" indent="no"/> <!--update 1--> <xsl:strip-space elements="*"/> <xsl:template match="/"> <xsl:variable name="root-replica"> <xsl:call-template name="create-root-replica"> <xsl:with-param name="context" select="*"/> </xsl:call-template> </xsl:variable> <xsl:copy-of select="$root-replica"/> <!-- <xsl:call-template name="split-n-serialize"> <xsl:with-param name="context" select="$root-replica"/> </xsl:call-template> --> </xsl:template> <xsl:template name="split-n-serialize"> <xsl:param name="context"/> <xsl:for-each select="$context"> <xsl:result-document encoding="utf-8" href="{concat('split_',position(),'.xml')}" method="xml" indent="no"> <xsl:sequence select="."/> </xsl:result-document> </xsl:for-each> </xsl:template> <xsl:template name="create-root-replica"> <xsl:param name="context"/> <root> <head> <xsl:value-of select="$context/head"/> </head> <xsl:apply-templates select="$context/*[not(self::head)]"/> </root> </xsl:template> <xsl:template match="element()"> <xsl:element name="{local-name()}"> <xsl:attribute name="id"> <xsl:value-of select="generate-id()"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template> <!--update 2--> <xsl:template match="text()"> <xsl:value-of select="normalize-space(.)"/> </xsl:template> </xsl:transform> 个字符(假设1562等于\s+),我想使用source xml中提到的规则将拆分为4个部分文档。

有谁知道怎么做?非常感谢任何想法或评论。

更新3

拆分文件的详细信息

A.xml

有关拆分程序的详细信息:

  1. 元素«head»的内容应该是每个XML文件的一部分。

  2. 文件可以从部分中间拆分,但不能在段落中间拆分。

  3. 不是«title»元素应该在分割结束时出现。

  4. 拆分文件中的最大数字字符(不包括开始和结束标记)最多为600个。

  5. 示例输出文件(缩进用于提高可读性)

    第一个文件

    1st File
           8
          37
         204  =  249
    2nd File
           8
          19
         432  =  459
    3rd File
           8
         453  =  461
    4th File
           8
          19
         390  =  417
    

    第二档

    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <head>Kinesics</head>
        <section id="d1e6">
            <para id="d1e7">From Wikipedia, the free encyclopedia</para>
            <para id="d1e10">Kinesics is the interpretation of body language such as facial expressions and gestures — or, more formally, non-verbal behavior related to movement, either of any part of the body or the body as a whole.</para>
        </section>
    </root>
    

    第三档

    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <head>Kinesics</head>
        <section id="d1e6">
            <section id="d1e13">
                <title id="d1e14">Birdwhistell's work</title>
                <para id="d1e17">The term was first used (in 1952) by Ray Birdwhistell, an anthropologist who wished to study how people communicate through posture, gesture, stance, and movement. Part of Birdwhistell's work involved making film of people in social situations and analyzing them to show different levels of communication not clearly seen otherwise. The study was joined by several other anthropologists, including Margaret Mead and Gregory Bateson.</para>
            </section>
        </section>
    </root>
    

    第4个文件

    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <head>Kinesics</head>
        <section id="d1e6">
            <section id="d1e13">
                <para id="d1e20">Drawing heavily on descriptive linguistics, Birdwhistell argued that all movements of the body have meaning (i.e. are not accidental), and that these non-verbal forms of language (or paralanguage) have a grammar that can be analyzed in similar terms to spoken language. Thus, a "kineme" is "similar to a phoneme because it consists of a group of movements which are not identical, but which may be used interchangeably without affecting social meaning".</para>
            </section>
        </section>
    </root>
    

1 个答案:

答案 0 :(得分:0)

您可以使用string-length()来获取&#34;字符数&#34;然后xsl:result-document将结果树分成几部分。

您是否需要进一步的帮助编码?