Question

我接受了一个我认为只是编写xsl并使用并覆盖身份规则的任务：删除空的自闭段落元素  来自多个文本文件。

在对文本进行一些处理之后，我意识到如果我删除所有段落元素，这将使文本的一部分不可读。

一个想法是尝试仅删除句子中的，但因为并非所有句子都被a终止。这不是那么容易的。删除时的另一个问题是，我在某些地方获得了双空白，而在某些地方没有空白。

我也觉得我可能会移除以上的内容，但我无法让模板与p匹配。

我真的可以在这里使用一些建议。使用这种材料我能达到的最佳效果是什么？

文本文件的示例：

 <?xml version="1.0" encoding="utf-8"?>
<Data>
  <Text number="1">
    <Title>Lazy dog jumper</Title>
    <Description><arg format="x" /><p /> The quick <p /> brown fox <p /> jumps <p /> over the <p /> lazy dog.<p />The quick brown fox jumps <p />over the lazy dog.<p /> The quick brown fox jumps over the lazy dog. <p /></Description>
  </Text>
  <Text number="2">
    <Title>Lazy foxer</Title>
    <Description>The quick brown fox <arg format="x" /><p />jumps over the lazy dog <p /></Description>
  </Text>
  <Text number="3">
    <Title>Quickest jumper</Title>
    <Description>The quickest brown fox jumps over the lazy dog <p /> The slowest brown fox jumps over the laziest dog.  <p /></Description>
  </Text>
</Data>

xls（只是身份规则并覆盖此规则）：

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(normalize-space()) and not(.//@*)]"/>

    <!-- This is not matching <p /> ! -->
    <xsl:template match="p" />

</xsl:stylesheet>

所需的输出：

<?xml version="1.0" encoding="utf-8"?>
<Data>
  <Text number="1">
    <Title>Lazy dog jumper</Title>
    <Description><arg format="x" />.<p />The quick brown fox jumps over the lazy dog.<p />The quick brown fox jumps over the lazy dog.<p />The quick brown fox jumps over the lazy dog.</Description>
  </Text>
  <Text number="2">
    <Title>Lazy foxer</Title>
    <Description>The quick brown fox <arg format="x" /> jumps over the lazy dog.</Description>
  </Text>
  <Text number="3">
    <Title>Quickest jumper</Title>
    <Description>The quickest brown fox jumps over the lazy dog.<p />The slowest brown fox jumps over the laziest dog.</Description>
  </Text>
</Data>

由于我无法确定文本中真实段落的位置，因此我不敢删除句子末尾的段落。唯一的例外是每个描述文本片段的最后一句。我在句子中添加了缺失的句号，我可以猜到应该有一个。

Answer 1

我认为这是您希望使用正则表达式（例如\<p\s+?\/\>[ ]?）作为替换搜索（https://regex101.com/r/yG6hM5/1）的情况。这是因为您希望从空p标记中删除该尾随空格。但它不会捕获或 之类的情况。

在XSLT中这可能很难做到，因为您还希望在空p标记之后删除空格。你可以尝试这个XSLT：

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()[local-name()!='p']">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="p[text() != '']">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

http://xsltransform.net/3NJ38ZJ

但它不会摆脱那个额外的空间。

我们的想法是从您的身份模板中排除所有p个节点，并使用其他模板来匹配文本内容不为空的p个标记。如果您想要排除 之类的内容，也可以将第二个模板更改为p[normalize-space(text()) != '']。

删除包含一些问题的文本中的空自关闭段落元素

1 个答案: