XSLT2不区分大小写的正则表达式

时间:2012-03-09 06:18:35

标签: xslt

美好的一天!

这是我第一次在这里发帖提问,请耐心等待。

我遇到了如何匹配不区分大小写的问题我试图添加标志但不确定这是否是正确的位置插入标志,因为我的代码似乎不区分大小写。

提前感谢您分享您如何解决此问题的想法。

- 埃尔默

我对这篇文章进行了编辑,以添加更多代码,并为任何想要进行测试的源XML文件示例。

基于索引条目列表,我需要搜索从源XML到搜索的每个项目,并且必须不区分大小写。例如,如果将搜索单词“缩写”,则所有单词“缩写(缩写)或缩写”必须在单词缩写或缩写旁边有锚元素。

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/"
xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all">

<xsl:output method="xml" encoding="UTF-8" indent="no"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!-- create variable for EALL_Index-->
<xsl:variable name="index" as="element() +">
    <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item">
        <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/>
        <list>
            <entry>
                <xsl:variable name="itemString">
                    <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))">
                        <xsl:matching-substring>
                            <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/>
                            <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/>
                            <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/>
                            <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/>
                        </xsl:matching-substring>
                        <xsl:non-matching-substring>
                            <xsl:value-of select="."/>
                        </xsl:non-matching-substring>
                    </xsl:analyze-string>
                </xsl:variable>
                <xsl:value-of select="normalize-space($itemString)"/>
            </entry>
            <xsl:for-each select="link">
                <link target="{@targets}" id="{@n}">
                </link>
            </xsl:for-each>
        </list>
    </xsl:for-each>
</xsl:variable>

<xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]">
    <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/>
    <xsl:variable name="srchString" as="xs:string" select="."/>
     <xsl:choose>
         <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'))">
             <xsl:message select="concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/>
             <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}">
                <xsl:matching-substring>
                    <xsl:value-of select="regex-group(1)"/>
                    <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'))]/link[@target=$id]">
                        <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'))">
                            <anchor target="{@target}" id="{@id}"/>
                        </xsl:if>
                    </xsl:for-each>
                    <xsl:value-of select="regex-group(2)"/>
                    <xsl:value-of select="regex-group(3)"/>
                </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
         </xsl:analyze-string>
    </xsl:when>
    <xsl:otherwise>
        <xsl:sequence select="."/>
    </xsl:otherwise>
    </xsl:choose>
</xsl:template>

索引条目的源XML:

<div2 id="EALL-index">
<head>EALL3 Index</head>
<art><simplearticle id="SIM-INDEX" entry="" volume="" page="">
<pseudoarticle>
<articleentry><mainentry>Index of Terms</mainentry></articleentry></pseudoarticle><list>
<item>Abbreviation <link type="eall" targets="SIM-0002" n="idx-abbreviation-01">Abbreviations</link>, <link type="eall" targets="SIM-0021" n="idx-abbreviation-02">Compounds</link>, <link type="eall" targets="COM-vol3-0192" n="idx-abbreviation-03">Lexicography: Monolingual Dictionaries</link>, <link type="eall" targets="COM-vol3-0276" n="idx-abbreviation-04">Punctuation</link></item>
</list></simplearticle></art></div2>

搜索的源XML:

<?xml version="1.0" encoding="UTF-8"?>
<art>
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" page="1:1a" sortcode="abbreviations">
<pseudoarticle>
<articleentry>
<mainentry>Abbreviations</mainentry>
</articleentry>
<p>Generally speaking, there are four main categories of abbreviations encountered in Arabic texts:</p>
<p>
<list>
<label>i.</label>
<item>i. Suspensions: abbreviation by truncation of the letters at the end of the word, e.g.&#160;&#1575;&#1604;&#1605;&#1589;&#1600; = <hi rend="italic">al-mu&#7779;annif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals &#1778; and &#1635;, but which may represent the unpointed <hi rend="italic">t&#257;&#702;</hi> and <hi rend="italic">&#353;&#299;n</hi> (for <hi rend="italic">tam&#257;m</hi> and <hi rend="italic">&#353;ar&#7717;</hi>) when used in conjunction with marginal glosses.</item>
<label>ii.</label>
<item>ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. &#1602;&#1607; (<hi rend="italic">qawlu-hu</hi>).</item>
<label>iii.</label>
<item>iii. Sigla: using one letter to represent the whole word, e.g. &#1605; (<hi rend="italic">matn</hi>).</item>
<label>iv.</label>
<item>iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> &#8216;year&#8217;. Another example is the &#8216;two teeth stroke&#8217; (which looks like two unpointed <hi rend="italic">b&#257;</hi>'s), which represents the word &#1602;&#1601; &#8216;stop&#8217;, or the suspension &#1601;&#1578;&#1600; (for <hi rend="italic">fa-ta&#702;ammal-hu/h&#257;</hi> &#8216;reflect on it&#8217;), used in manuscripts for notabilia or side-heads.</item>
</list>
</p>
<p>Closely connected with these abbreviations is a contraction of a group of words into one &#8216;portmanteau&#8217; word (<hi rend="italic">na&#7717;t</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm All&#257;h</hi>) <hi rend="italic">&#7717;amdala</hi> (<hi rend="italic">al-&#7717;amdu li-ll&#257;h</hi>) and <hi rend="italic">&#7779;alwala</hi> (<hi rend="italic">&#7779;all&#257; ll&#257;h &#703;alayhi</hi>). To all intents and purposes, the word <hi rend="italic">na&#7717;t</hi> corresponds to an acronym, i.e. a word formed from the abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, &#7717;amdala</hi>, and <hi rend="italic">&#7779;alwala</hi>, we encounter <hi rend="italic">&#7789;albaqa</hi> (<hi rend="italic">&#7789;&#257;la ll&#257;h baq&#257;&#702;a-hu</hi>), <hi rend="italic">&#7717;awqala</hi> or <hi rend="italic">&#7717;awlaqa</hi> (<hi rend="italic">l&#257; &#7717;awla wa-l&#257; quwwata &#702;ill&#257; bi-ll&#257;h</hi>), <hi rend="italic">&#7779;al&#703;ama</hi> (a synonym of <hi rend="italic">&#7779;alwala</hi>), <hi rend="italic">&#7717;asbala</hi> (<hi rend="italic">&#7717;asbun&#257; all&#257;h</hi>), <hi rend="italic">ma&#353;&#702;ala</hi> (<hi rend="italic">m&#257; &#353;&#257;&#702;a ll&#257;h</hi>), <hi rend="italic">sab&#7717;ala</hi> (<hi rend="italic">sub&#7717;&#257;na ll&#257;h</hi>), and <hi rend="italic">&#7717;ay&#703;ala</hi> (<hi rend="italic">&#7717;ayya &#703;al&#257; &#7779;-&#7779;al&#257;t</hi>) (as-Samarr&#257;'&#299; 1987; Gacek 2001).</p>
<p>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p>
<p>The use of abbreviations was quite popular among Muslim scholars, although originally some of the abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">ta&#7779;liya, &#7779;alwala</hi>), were disapproved of. In the manuscript age, abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Ma&#7717;f&#363;&#7827; 1964).</p>
<p>Medieval scholars could not always agree on the meaning of some of the abbreviations used in manuscripts. The letter &#1581;, for instance, which is used to separate one <hi rend="italic">&#702;isn&#257;d</hi> from another, was thought by some to stand for <hi rend="italic">&#7717;&#257;&#702;il</hi> or <hi rend="italic">&#7717;ayl&#363;la</hi> &#8216;separation&#8217; and by others for <hi rend="italic">&#7717;ad&#299;&#7791;</hi> and even <hi rend="italic">&#7779;a&#7717;&#7717;a</hi>. Some scholars even thought that the letter <hi rend="italic">&#7717;&#257;&#702;</hi> should be pointed (&#1582; &#8211; <hi rend="italic">x&#257;&#702; mu&#703;jama</hi>) to stand for <hi rend="italic">&#702;isn&#257;d &#702;&#257;xar</hi> &#8216;another <hi rend="italic">&#702;isn&#257;d</hi>&#8217;. The contemporary scholar may face a similar dilemma (see e.g. Ali&#269; 1976).</p>
<p>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter &#1591; may in fact be a &#1592;, and what appears to be an <hi rend="italic">&#703;ayn</hi> or <hi rend="italic">ayn</hi>, in its initial (&#1593;&#1600;) or isolated form (&#1593;), may actually be an unpointed <hi rend="italic">n&#363;n</hi> or <hi rend="italic">x&#257;&#702;</hi> (for <hi rend="italic">nusxa &#702;uxr&#257;</hi> &#8216;another copy&#8217;). Similarly, the same word or abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">&#7717;&#257;&#353;iya</hi> and <hi rend="italic">f&#257;&#702;ida</hi> can stand for a gloss or a side-head (&#8216;nota bene&#8217;), while the &#1589; or &#1589;&#1600; can be an abbreviation of <hi rend="italic">&#7779;a&#7717;&#7717;a</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">&#702;a&#7779;l</hi> (the body of the text), or it can stand for <hi rend="italic">&#7693;abba</hi> &#8216;door-bolt&#8217;, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or &#8216;sic&#8217;. Also, the abbreviation &#1606; may stand for <hi rend="italic">bay&#257;n</hi> &#8216;explanation&#8217; or <hi rend="italic">nusxa &#702;uxr&#257;</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p>
<p>The earliest use of abbreviations in the Arabic language is probably connected with its orthography and possibly the &#8216;mysterious letters&#8217; (<hi rend="italic">al-&#7717;ur&#363;f al-muqa&#7789;&#7789;a&#703;a</hi>) at the beginning of some chapters of the <hi rend="italic">Qur&#702;&#257;n</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">j&#299;m</hi> (&#1580;&#1600;) or <hi rend="italic">m&#299;m</hi> (&#1605;&#1600;) was regarded by some scholars as an abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">&#353;&#299;n</hi> (&#1587;&#1600;) was used for <hi rend="italic">ta&#353;d&#299;d</hi> (or <hi rend="italic">&#353;adda</hi>), and the initial form of <hi rend="italic">&#7779;&#257;d</hi> (&#1589;&#1600;) was thought to represent <hi rend="italic">wa&#7779;la</hi> (or <hi rend="italic">&#7779;ila</hi>) (Wright 1967:13&#8211;14, 19; Gacek 2001:23).</p>
<p>Most of the abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qur&#702;&#257;n</hi>, compilation and criticism of <hi rend="italic">&#7716;ad&#299;&#7791;</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of abbreviations (see, e.g., &#703;Alaw&#257;n 1972). They are especially common in works on <hi rend="italic">&#7716;ad&#299;&#7791;</hi> and jurisprudence (both Sunni and Shi&#703;i) (al-M&#257;maq&#257;n&#299; 1992; a&#7827;-Zufayr&#299; 2002), and although some abbreviations were standardized, most were specific to a given work. Among the commonly used abbreviations for major <hi rend="italic">&#7716;ad&#299;&#7789;</hi> compilations are: &#1582; (al-Bux&#257;r&#299;), &#1605; (Muslim or M&#257;lik), &#1583; (&#702;Ab&#363; D&#257;&#702;&#363;d), &#1578; (at-Tirmi&#7695;&#299;), &#1603; (M&#257;lik), &#1607; (&#702;Ab&#363; &#7694;arr or Ibn M&#257;ja), &#1606; or &#1587; (an-Nas&#257;&#702;&#299;), and the like (Gacek 1989:56).</p>
</pseudoarticle>
</simplearticle>
</art>

示例输出:

<art>
<simplearticle id="SIM-0002" entry="Abbreviations" volume="1" sortcode="abbreviations" page="1:1a">
    <pseudoarticle>
        <articleentry>
            <mainentry><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations</mainentry>
        </articleentry>
        <p>Generally speaking, there are four main categories of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations encountered in Arabic texts:</p>
        <p>
            <list type="simple" TEIform="list">
                <label TEIform="label">i.</label>
                <item TEIform="item">i. Suspensions: <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation by truncation of the letters at the end of the word, e.g. المصـ = <hi rend="italic">al-muṣannif</hi>. Perhaps the most interesting here is the case of suspensions that look like, or were considered by some to be, numerals. To this category belong the signs that resemble the numerals ۲ and ٣, but which may represent the unpointed <hi rend="italic">tāʾ</hi> and <hi rend="italic">šīn</hi> (for <hi rend="italic">tamām</hi> and <hi rend="italic">šarḥ</hi>) when used in conjunction with marginal glosses.</item>
                <label TEIform="label">ii.</label>
                <item TEIform="item">ii. Contractions: abbreviating by means of omitting some letters in the middle of the word, but not the beginning or the ending, e.g. قه (<hi rend="italic">qawlu-hu</hi>).</item>
                <label TEIform="label">iii.</label>
                <item TEIform="item">iii. Sigla: using one letter to represent the whole word, e.g. م (<hi rend="italic">matn</hi>).</item>
                <label TEIform="label">iv.</label>
                <item TEIform="item">iv. Abbreviation symbols: symbols in the form of logographs used for whole words. A typical <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation symbol is the horizontal stroke (sometimes hooked at the end) which represents the word <hi rend="italic">sana</hi> ‘year’. Another example is the ‘two teeth stroke’ (which looks like two unpointed <hi rend="italic">bā</hi>'s), which represents the word قف ‘stop’, or the suspension فتـ (for <hi rend="italic">fa-taʾammal-hu/hā</hi> ‘reflect on it’), used in manuscripts for notabilia or side-heads.</item>
            </list>
        </p>
        <p>Closely connected with these <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations is a contraction of a group of words into one ‘portmanteau’ word (<hi rend="italic">naḥt</hi>; <xref target="SIM-0021">compounds</xref>), for instance <hi rend="italic">basmala</hi> (<hi rend="italic">bi-sm Allāh</hi>) <hi rend="italic">ḥamdala</hi> (<hi rend="italic">al-ḥamdu li-llāh</hi>) and <hi rend="italic">ṣalwala</hi> (<hi rend="italic">ṣallā llāh ʿalayhi</hi>). To all intents and purposes, the word <hi rend="italic">naḥt</hi> corresponds to an <anchor target="SIM-0002" id="idx-acronym-01"/>acronym, i.e. a word formed from the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of, in most cases, the initial letters of each word in the construct. Most of these constructs are textual and pious formulae. Apart from <hi rend="italic">basmala, ḥamdala</hi>, and <hi rend="italic">ṣalwala</hi>, we encounter <hi rend="italic">ṭalbaqa</hi> (<hi rend="italic">ṭāla llāh baqāʾa-hu</hi>), <hi rend="italic">ḥawqala</hi> or <hi rend="italic">ḥawlaqa</hi> (<hi rend="italic">lā ḥawla wa-lā quwwata ʾillā bi-llāh</hi>), <hi rend="italic">ṣalʿama</hi> (a synonym of <hi rend="italic">ṣalwala</hi>), <hi rend="italic">ḥasbala</hi> (<hi rend="italic">ḥasbunā allāh</hi>), <hi rend="italic">mašʾala</hi> (<hi rend="italic">mā šāʾa llāh</hi>), <hi rend="italic">sabḥala</hi> (<hi rend="italic">subḥāna llāh</hi>), and <hi rend="italic">ḥayʿala</hi> (<hi rend="italic">ḥayya ʿalā ṣ-ṣalāt</hi>) (as-Samarrā'ī 1987; Gacek 2001).</p>
        <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations, especially contractions and sigla, may be (and often are) accompanied by a horizontal stroke (<hi rend="italic">tilde</hi>) placed above them. This mark may resemble the <hi rend="italic">madda</hi> but has nothing to do with the latter's function in Arabic script. Suspensions, on the other hand, were indicated by a long downward stroke, a mark that is very likely to have been borrowed from Greek and Latin paleographic practice.</p>
        <p>The use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations was quite popular among Muslim scholars, although originally some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations, such as those relating to the prayer for the Prophet (<hi rend="italic">taṣliya, ṣalwala</hi>), were disapproved of. In the manuscript age, <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were extensively used, not only in the body of the text but also in marginalia, ownership statements, and in the primitive critical apparatus (Ben Cheneb 1920; Maḥfūẓ 1964).</p>
        <p>Medieval scholars could not always agree on the meaning of some of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations used in manuscripts. The letter ح, for instance, which is used to separate one <hi rend="italic">ʾisnād</hi> from another, was thought by some to stand for <hi rend="italic">ḥāʾil</hi> or <hi rend="italic">ḥaylūla</hi> ‘separation’ and by others for <hi rend="italic">ḥadīṯ</hi> and even <hi rend="italic">ṣaḥḥa</hi>. Some scholars even thought that the letter <hi rend="italic">ḥāʾ</hi> should be pointed (خ – <hi rend="italic">xāʾ muʿjama</hi>) to stand for <hi rend="italic">ʾisnād ʾāxar</hi> ‘another <hi rend="italic">ʾisnād</hi>’. The contemporary scholar may face a similar dilemma (see e.g. Alič 1976).</p>
        <p><anchor target="SIM-0002" id="idx-abbreviation-01"/>Abbreviations in manuscripts are often unpointed and appear sometimes in the form of word-symbols (logographs). Here, the context, whether textual or geographical, is of great importance. Thus, for instance, what appears to be the letter ط may in fact be a ظ, and what appears to be an <hi rend="italic">ʿayn</hi> or <hi rend="italic">ayn</hi>, in its initial (عـ) or isolated form (ع), may actually be an unpointed <hi rend="italic">nūn</hi> or <hi rend="italic">xāʾ</hi> (for <hi rend="italic">nusxa ʾuxrā</hi> ‘another copy’). Similarly, the same word or <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation can have two different functions and/or meanings. For example, the words <hi rend="italic">ḥāšiya</hi> and <hi rend="italic">fāʾida</hi> can stand for a gloss or a side-head (‘nota bene’), while the ص or صـ can be an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">ṣaḥḥa</hi> (when used for an omission/ insertion or evident correction) or <hi rend="italic">ʾaṣl</hi> (the body of the text), or it can stand for <hi rend="italic">ḍabba</hi> ‘door-bolt’, a mark indicating an uncertain reading and having, for all intents and purposes, the function of a question mark or ‘sic’. Also, the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation ن may stand for <hi rend="italic">bayān</hi> ‘explanation’ or <hi rend="italic">nusxa ʾuxrā</hi>; the latter is often found in manuscripts of Persian/ Indian provenance.</p>
        <p>The earliest use of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations in the Arabic language is probably connected with its orthography and possibly the ‘mysterious letters’ (<hi rend="italic">al-ḥurūf al-muqaṭṭaʿa</hi>) at the beginning of some chapters of the <hi rend="italic">Qurʾān</hi> (Bellamy 1973). In terms of orthography, for instance, the initial form of <hi rend="italic">jīm</hi> (جـ) or <hi rend="italic">mīm</hi> (مـ) was regarded by some scholars as an <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviation of <hi rend="italic">jazma</hi>. Furthermore, the unpointed initial form of <hi rend="italic">šīn</hi> (سـ) was used for <hi rend="italic">tašdīd</hi> (or <hi rend="italic">šadda</hi>), and the initial form of <hi rend="italic">ṣād</hi> (صـ) was thought to represent <hi rend="italic">waṣla</hi> (or <hi rend="italic">ṣila</hi>) (Wright 1967:13–14, 19; Gacek 2001:23).</p>
        <p>Most of the <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations are found in the body of the text. They were introduced in order to speed up the process of transcription and their usage varied according to the subject or type of a given work. Abbreviations can be found in almost all types of works, but especially in compositions on the recitation of the <hi rend="italic">Qurʾān</hi>, compilation and criticism of <hi rend="italic">Ḥadīṯ</hi>, philosophy, lexicography, poetry, genealogy, biography, and astronomy. The lists of these are often included in prefaces and frequently concern either the names of authors or titles of compositions. In addition, we find didactic poems that were composed specifically in order to help memorize given sets of <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations (see, e.g., ʿAlawān 1972). They are especially common in works on <hi rend="italic">Ḥadīṯ</hi> and jurisprudence (both Sunni and Shiʿi) (al-Māmaqānī 1992; aẓ-Zufayrī 2002), and although some <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were standardized, most were specific to a given work. Among the commonly used <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations for major <hi rend="italic">Ḥadīṭ</hi> compilations are: خ (al-Buxārī), م (Muslim or Mālik), د (ʾAbū Dāʾūd), ت (at-Tirmiḏī), ك (Mālik), ه (ʾAbū Ḏarr or Ibn Māja), ن or س (an-Nasāʾī), and the like (Gacek 1989:56).</p>
        <p>Specific to <hi rend="italic">Ḥadīṯ</hi> literature are other <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations connected with the frequent repetitions of such expressions as <hi rend="italic">ḥaddaṯ anā, ʾaxbaranā</hi>, and <hi rend="italic">ʾanbaʾanā</hi>, which were commonly abbreviated as: دثنا ,ثنا ,نا (<hi rend="italic">ḥaddaṯ anā</hi>); ابنا ,ارنا ,انا (<hi rend="italic">ʾaxbaranā</hi>); ق ثنا ,قثنا (<hi rend="italic">qāla ḥaddaṯ anā</hi>). The transition from one <hi rend="italic">ʾisnād</hi> to another, as mentioned above, was marked with ح (<hi rend="italic">ḥāʾil, taḥwīl, ḥaylula, ḥadīṯ</hi> or <hi rend="italic">ṣaḥḥa</hi>) (Gacek 1989:56), and for the evaluation of <hi rend="italic">ḥadīṯ</hi>s the following <anchor target="SIM-0002" id="idx-abbreviation-01"/>abbreviations were used:ض (<hi rend="italic">ḍaʿīf</hi>), صح (<hi rend="italic">ṣaḥīḥ</hi>), ح (<hi rend="italic">ḥasan</hi>); م (<hi rend="italic">majhūl</hi>), مو (<hi rend="italic">muwāfiq</hi> or <hi rend="italic">mawqūf</hi>), قف (<hi rend="italic">mawqūf</hi>); ق (<hi rend="italic">muwaṭṭaq</hi> or <hi rend="italic">muttafaq ʿalayhi</hi>), ل (<hi rend="italic">mursal</hi>) (e.g. Gacek 1985:xiv, 96).</p>
    </pseudoarticle>
</simplearticle>

3 个答案:

答案 0 :(得分:0)

除了详细信息(例如不同的项目列表和从何处构建锚点的选择,以及在循环多个艺术元素时的更改),下面的内容应该让你再次进入(不需要正则表达式,只需使用适当的函数)。我还没有替换文本中出现的缩写。或者这是你的错误?在“缩写”旁边是否应该处理“缩写”?那怎么决定呢?比如使用searchString的“s”来处理那个词呢?

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:f="data:f"
        exclude-result-prefixes="f">

    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/>
    <xsl:variable name="anchor">
        <xsl:element name="anchor">
            <xsl:attribute name="target" select="//simplearticle/@id"/>
            <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/>
        </xsl:element>
    </xsl:variable>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>  
    </xsl:template>

    <xsl:template match="text()">
        <xsl:call-template name="checkWord">
            <xsl:with-param name="splitText" select="tokenize(., ' ')"/>
        </xsl:call-template>
    </xsl:template>

    <xsl:template name="checkWord">
        <xsl:param name="splitText"/>
        <xsl:for-each select="$splitText">
            <xsl:if test="lower-case(.) = $searchString">
                <xsl:copy-of select="$anchor"/>
            </xsl:if>
            <xsl:value-of select="."/><xsl:text> </xsl:text>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

答案 1 :(得分:0)

我的另一个答案是不使用替换功能的答案 - 这会导致不完整的结果,因为“缩写”的出现不会被替换。 所以我尝试了另一种方法,包括利用迈克尔建议的“我”旗帜;不幸的是,这更复杂,因为它需要序列化到锚元素的字符串。见下文。
注意:此解决方案也存在轻微缺陷:不会出现“缩写”等情况。 此外,对序列化模板略有改进将导致< anchor .../>而不是< anchor ...>< /anchor>

请参阅下面的已修改,了解已解决这两个问题的版本。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:f="data:f"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="f xs">

    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="searchString" select="//simplearticle/@sortcode"/>
    <xsl:variable name="anchor">
        <xsl:element name="anchor">
            <xsl:attribute name="target" select="//simplearticle/@id"/>
            <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/>
        </xsl:element>
    </xsl:variable>

    <xsl:template match="/">
        <xsl:apply-templates mode="main"/>
    </xsl:template>

    <xsl:template match="@*|node()" mode="main">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="main"/>
        </xsl:copy>  
    </xsl:template>

    <xsl:template match="text()" mode="main">
        <!--<xsl:variable name="replacement" select="concat(f:serialize($anchor), $searchString)"/>-->
        <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/>
        <xsl:variable name="replacement">
            <xsl:value-of select="$anchorAdd"/>
            <xsl:value-of select="$searchString"/>
        </xsl:variable>
        <xsl:value-of select="replace(., $searchString, $replacement, 'i')"  disable-output-escaping="yes"/>
    </xsl:template>

    <xsl:template match="*" mode="serialize">
        <xsl:text>&lt;</xsl:text>
        <xsl:value-of select="name()"/>
        <xsl:apply-templates mode="serialize" select="@*"/> 
        <xsl:text>&gt;</xsl:text>
        <xsl:apply-templates mode="serialize"/> 
        <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
    </xsl:template>

    <xsl:template match="text()" mode="serialize">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="@*" mode="serialize">
        <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>"
    </xsl:template>

    <xsl:function name="f:serialize">
        <xsl:param name="xml"/>
        <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function>
</xsl:stylesheet>

<强> EDITED

说明:
(1)使用反向引用$ 0表示要替换的字符串
(2)仍然必须使用替换变量,虽然这似乎有点奇怪;在直接使用$ anchorAdd时,reason是一条错误消息 (3)序列化:测试(文本)节点的非空虚 - 在这种情况下/&gt;而不是应用模板

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:f="data:f"
        exclude-result-prefixes="f">

    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="searchString" select="lower-case(//simplearticle/@sortcode)"/>
    <xsl:variable name="anchor">
        <xsl:element name="anchor">
            <xsl:attribute name="target" select="//simplearticle/@id"/>
            <xsl:attribute name="id" select="concat('idx-', $searchString,'-01')"/>
        </xsl:element>
    </xsl:variable>

    <xsl:template match="/">
        <xsl:apply-templates mode="main"/>
    </xsl:template>

    <xsl:template match="@*|node()" mode="main">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="main"/>
        </xsl:copy>  
    </xsl:template>

    <xsl:template match="text()" mode="main">
        <xsl:variable name="anchorAdd" select="f:serialize($anchor)"/>
        <xsl:variable name="replacement">
            <xsl:value-of select="$anchorAdd"/>
        </xsl:variable>
        <xsl:value-of select="replace(., $searchString, concat($replacement, '$0'), 'i')"  disable-output-escaping="yes"/>
    </xsl:template>

    <xsl:template match="*" mode="serialize">
        <xsl:text>&lt;</xsl:text>
        <xsl:value-of select="name()"/>
        <xsl:apply-templates mode="serialize" select="@*"/> 
        <xsl:choose>
            <xsl:when test="not (string(.))">
                <xsl:text>/&gt;</xsl:text>
            </xsl:when>
            <xsl:otherwise>
                <xsl:text>&gt;</xsl:text>
                <xsl:apply-templates mode="serialize"/> 
                <xsl:text>&lt;/</xsl:text><xsl:value-of select="name()"/><xsl:text>&gt;</xsl:text> 
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="text()" mode="serialize">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="@*" mode="serialize">
        <xsl:text> </xsl:text><xsl:value-of select="name()"/>="<xsl:value-of select="."/>"<xsl:text/>
    </xsl:template>

    <xsl:function name="f:serialize">
        <xsl:param name="xml"/>
        <xsl:apply-templates mode="serialize" select="$xml"/> 
    </xsl:function>
</xsl:stylesheet>

答案 2 :(得分:0)

最后我能够解决我的问题,所以我在这里发布了有效的XSLT。

首先我将'i'添加到我的匹配功能中:

何时

test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i');

for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]

if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')

then add regex to match the lowercase word(s) by adding lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')) in to the analyze-string.

analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}"

使用这个正则表达式,我能够得到小写的“缩写”和“缩写”,带有初始大写。

(^|\W)(abbreviation[s]?|acronym[s]?|Abbreviation[s]?|acronym)($|\W)

特别感谢 Michael Kay Maestro13

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" xmlns:saxon="http://saxon.sf.net/"
    xmlns:ati="http://www.asiatype.com/Functions" exclude-result-prefixes="#all">

    <xsl:output method="xml" encoding="UTF-8" indent="no"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:variable name="Entries">
        <xsl:for-each select="for $doc in collection(iri-to-uri(concat('original-eall-xml/', '?select=*.xml'))) return $doc/descendant::art/*">
            <list>
                <entry>
                    <xsl:value-of select=".//mainentry"/>
                </entry>
                <id>
                    <xsl:value-of select="./@id"/>
                </id>
                <page>
                    <xsl:value-of select="./@page"/>
                </page>
            </list>
        </xsl:for-each>
    </xsl:variable>

    <xsl:template match="art/*">
        <xsl:variable name="myId">
            <xsl:value-of select="tokenize(@id,' ')[1]"/>
        </xsl:variable>
        <xsl:variable name="myStrings" select="//text()"/>
        <xsl:variable name="myEntry" select="descendant::mainentry/text()"/>
        <xsl:copy>
            <xsl:copy-of select="@* except @page"/>
            <xsl:choose>
                <xsl:when test="$Entries//entry[.=$myEntry] and $Entries//entry[.=$myEntry]/following-sibling::id[.=$myId]">
                    <xsl:attribute name="page"><xsl:value-of select="$Entries//entry[.=$myEntry]/following-sibling::page"/></xsl:attribute>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:attribute name="page">0</xsl:attribute>
                    <!--<xsl:message select="concat('mainentry: &quot;', $myEntry, '&quot; - volume: &quot;', @volume, '&quot; - id: &quot;',$myId, '&quot;')"/>-->
                </xsl:otherwise>
            </xsl:choose>

            <xsl:apply-templates/>
        </xsl:copy>

    </xsl:template>    

    <!-- create variable for EALL_Index-->
    <xsl:variable name="index" as="element() +">
        <xsl:for-each select="for $doc in document('EALL_Index_test.xml') return $doc/descendant::item">
            <xsl:sort select="string-length(normalize-space(text()[1]))" order="descending"/>
            <list>
                <entry>
                    <xsl:variable name="itemString">
                        <xsl:analyze-string select="text()[1]" regex="(\[)|(\])|(\()|(\))">
                            <xsl:matching-substring>
                                <xsl:value-of select="replace(regex-group(1),'\[','\\[')"/>
                                <xsl:value-of select="replace(regex-group(2),'\]','\\]')"/>
                                <xsl:value-of select="replace(regex-group(3),'\(','\\(')"/>
                                <xsl:value-of select="replace(regex-group(4),'\)','\\)')"/>
                            </xsl:matching-substring>
                            <xsl:non-matching-substring>
                                <xsl:value-of select="."/>
                            </xsl:non-matching-substring>
                        </xsl:analyze-string>
                    </xsl:variable>
                    <xsl:value-of select="normalize-space($itemString)"/>
                </entry>
                <xsl:for-each select="link">
                    <link target="{@targets}" id="{@n}">
                    </link>
                </xsl:for-each>
            </list>
        </xsl:for-each>
    </xsl:variable>

    <xsl:template match="text()[ancestor::*[self::simplearticle or self::complexarticle]]">
        <xsl:variable name="id" as="xs:string" select="current()/ancestor::*[self::simplearticle or self::complexarticle]/@id"/>
        <xsl:variable name="srchString" as="xs:string" select="."/>
         <xsl:choose>
             <xsl:when test="some $term in $index[link/@target=$id]/entry satisfies matches($srchString, concat('(^|\W)(',$term, '[s]?)($|\W)'),'i')">
                 <xsl:message select="concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')"/>
                 <xsl:analyze-string select="$srchString" regex="{concat('(^|\W)(',lower-case(string-join($index[link/@target=$id]/entry, '[s]?|')), '[s]?|', string-join($index[link/@target=$id]/entry, '[s]?|'), ')($|\W)')}">
                    <xsl:matching-substring>
                        <xsl:value-of select="regex-group(1)"/>
                        <xsl:for-each select="$index[matches(regex-group(2), concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')]/link[@target=$id]">
                            <xsl:if test="matches($srchString, concat('(^|\W)(',entry, '[s]?)($|\W)'),'i')">
                                <anchor target="{@target}" id="{@id}"/>
                            </xsl:if>
                        </xsl:for-each>
                        <xsl:value-of select="regex-group(2)"/>
                        <xsl:value-of select="regex-group(3)"/>
                    </xsl:matching-substring>
                <xsl:non-matching-substring>
                    <xsl:value-of select="."/>
                </xsl:non-matching-substring>
             </xsl:analyze-string>
        </xsl:when>
        <xsl:otherwise>
            <xsl:sequence select="."/>
        </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>
相关问题