如果满足条件,XPath将从最后一个元素中排除文本

时间:2015-12-15 14:35:18

标签: r xpath

如果文本包含在代码中的最后一个元素&lt; p &gt;&lt; em &gt; ...&lt; <,则目标是排除文本EM> / EM &GT;&LT; / p &GT;

...
<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>
<p><em>Felicity Spector is a deputy programme editor for <a href="http://www.channel4.com/news/" onclick="window.open(this.href);return false;" onkeypress="window.open(this.href);return false;">Channel 4 News</a></em>.</p>

输出应为:

"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.

因此,如果文本位于最后一个&lt; p &gt;中,则需要获取除文本之外的所有内容。标签,&lt; p &gt;之间没有其他文字和&lt; em &gt;标签以及&lt; / em &gt;之间没有其他文字和&lt; / p &gt;如上例所示。

我正在使用

//p[normalize-space()]

但它会返回eveerything,包括最后一个标记中的文字&lt; p &gt;&lt; em &gt; ...&lt; / em &GT;:

"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.
Felicity Spector is a deputy programme editor for Channel 4 News

应该排除最后的遗产。

感谢任何提示。

UPD

示例1。如果下一个文本位于最后一个&lt; p &gt;中,则应返回下一个文本。 (因为并非所有文本都在&lt; em &gt;:

<p>I was once on a travelling sanitation carnival in Uttar Pradesh when someone rushed up to me. “Rose! Rose! There’s a real ‘no loo no I do’!” If that’s the story of <em>Ek Prem Katha</em>, then I’m all in favour.  But because bringing a toilet into the world, when there are still 2.4 billion people without one, is by any reckoning a very happy ending.</p>

示例2。如果下一个文本位于最后一个&lt; p &gt;中,则应。 (因为里面的所有文字&lt; em &gt;):

<p><em>Sophie Elmhirst is an assistant editor of the NS</em></p>

1 个答案:

答案 0 :(得分:1)

目前还不清楚规则到底是什么,但这里有一个建议,你可以评论:

//p[normalize-space() and not(position() = last() and em)]

转换为

//p                           find all `p` elements anywhere in the document
[normalize-space()            but only if the contain at least 1 character that is not a white-space
and not(position() = last()   and only if the `p` element is not the last `p` child of its parent
and em)]                      and only if the `p` element has no child named `em`

并返回结果(单个结果以-------分隔):

<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
-----------------------
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
-----------------------
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>

警告:如果文档的结构实际上更复杂,那么在某些地方可能会出错,例如,当p元素出现在层次结构中的任何位置时。

  

也许不需要normalize-space但我总是使用它。

如果您真的想要排除只有空格的元素,请仅使用它。

  

我需要排除最后一个p中的文本,只要整个文本包含在em

嗯,在您自己的最后一个p元素的示例中,其中em并非如此:最后.实际上在em之外,它是一个p的文本节点。

编辑对评论作出反应:

  

我只是发现,如果在最后一个p中有一个文本,那么xpath不会返回文本。我已经更新了一个例子

然后使用下面的路径表达式:

//p[normalize-space() and not(position() = last() and em and not(text()))]

抱歉,直到现在我可能误解了你。 not(text())考虑的是p之外是否存在em本身的子文字。