在保留内容的同时将所有相对网址转换为绝对网址

时间:2015-07-10 17:58:46

标签: php

我正在使用简单的html dom废弃网站数据,但是在将相对网址转换为绝对网址时出现问题...想象直接网页链接是http://www.example.com/tutorial.html但是当我得到我想要的内容时,有相关的链接,我希望他们都是绝对的。例如:

, 'w') as output:
     ...

我希望得到类似的东西:

$string = "<p>this is text within string</p> and more random strings which contains link like <a href='docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='files/doc.doc'>This file</a>";

只是在保留$string = "<p>this is text within string</p> and more random strings which contains link like <a href='http://www.example.com/docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='http://www.example.com/files/doc.doc'>This file</a>"; 内容的同时将所有相对网址转换为绝对网址。

当尝试下面给出的解决方案时,对于报废的实际数据不起作用..

$string

1 个答案:

答案 0 :(得分:1)

您使用preg_replace是正确的,您可以尝试使用此代码

// [^>]* means 0 or more quantifiers except for >
// single quote AND double quote support
$regex = '~<a([^>]*)href=["\']([^"\']*)["\']([^>]*)>~';
// replacement for each subpattern (3 in total)
// basically here we are adding missing baseurl to href
$replace = '<a$1href="http://www.example.com/$2"$3>';

$string = "<p>this is text within string</p> and more random strings which contains link like <a href='docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='files/doc.doc'>This file</a>";
$replaced = preg_replace($regex, $replace, $string);    

结果

<p>this is text within string</p> and more random strings which contains link like <a href="http://www.example.com/docs/555text.fileextension">Download this file</a> <p>Other html follows where another relative link may exist like <a href="http://www.example.com/files/doc.doc">This file</a>