Question

我需要什么XPATH来提取SPAN中的文本，该文本前面是STRONG内的特定标签，两者都在P？

内

例如，从以下页面中提取网站和电子邮件地址：

<p>
<strong>Website:</strong>
<span>www.example.com</span>
</p>
<p>
<strong>Contact email:</strong>
<span>email@example.com</span>
</p>

Answer 1

这应该做：

//p/span[preceding::*[1][self::strong and . = 'Contact email:']]

在这里，您选择的是所有p/span元素，其中第一个元素为strong，其中label为Contact email:

Answer 2

网站：

//p/span[preceding::strong[1]/text()='Website:']

电子邮件：

//p/span[preceding::strong[1]/text()='Contact email:']

Answer 3

同样重要的是要注意，通过使用其他两个答案中显示的preceding轴，XPath将错误地返回如下形成的span元素：

<strong>Website:</strong>
<p>
<span>www.example.com</span>
</p>

您可以使用preceding-sibling轴来避免上述错误：

//p/span[preceding-sibling::*[1][self::strong and . = 'Website:']]

preceding-sibling轴仅考虑位于上下文元素之前的元素（在本例中为span），并且 sibling （共享上下文元素的相同父级。