Question

有一个网站，例如

  http://example.com

有这样一个页面：

 <div id="topnews">
      <a href="/news/topnews1.html"> Top news1 </a>
      <a href="/news/topnews2.html"> Top news2 </a>
      <a href="http://sport.example.com/news/topnews3.html"> Top news complex </a>
 </div>

是否可以通过纯Xpath获取这3个URL：

 http://example.com/news/topnews1.html
 http://example.com/news/topnews2.html
 http://sport.example.com/news/topnews3.html

要提取我们可以使用的相对网址：

   //div/a/@href

但是

  concat('http://example.com',  //div/a/@href)

只返回1行（第一行），而不是3个不同的值。

我不知道优雅检测和处理上一个完整的URL。

Answer 1

XPath 1.0

仅在XPath中不可能。

XPath 2.0

此XPath 2.0表达式，

for $h in //a/@href return
    if (starts-with($h, 'http:/'))
    then $h
    else concat('http://example.com',$h)

返回

http://example.com/news/topnews1.html
http://example.com/news/topnews2.html
http://sport.example.com/news/topnews3.html

根据要求提供您的文件。

如何通过Xpath从相对的URL制作完整的URL？

1 个答案:

XPath 1.0

XPath 2.0