我必须更换html中的一些无效链接,如下所示:
<td><a title="Michel Blanc" href="http://www.mysite.com/index.php?title=Michel_Blanc&action=edit&redlink=1">Michel Blanc</a></td>
<td><a title="Pierre Schöller" href="http://www.mysite.com/index.php?title=Pierre_Sch%C3%B6ller&action=edit&redlink=1">Pierre Schöller</a></td>
<td><a title="Focus Features" href="http://www.mysite.com/w/Focus_Features">Focus Features</a><br />
<a title="Olivier Treiner" href="http://www.mysite.com/index.php?title=Olivier_Treadfadfadfiner&action=edit&redlink=1">Olivier Treiner</a>
<td>1600</td>
我想删除所有<a>
代码,但如果href以
<a></a>
之间
http://www.mysite.com/index.php?title=
如果href以
开头,请保留<a>
代码
http://www.mysite.com/w/
这是我的正则表达式
(<a title="([\s\S])*?" href="http://www\.mysite\.com/index\.php\?title=([\s\S])*?&action=edit&redlink=1">([\s\S])*?</a>)
但它涉及我想要保留的第三行。 我在http://regexpal.com/
中对其进行了测试有人帮帮我吗?
答案 0 :(得分:0)
这个对我有用:
(<a title="[^>]*?" href="http://www\.mysite\.com/index\.php\?title=([\s\S])*?&action=edit&redlink=1">([\s\S])*?</a>)
答案 1 :(得分:0)
$subject = <<<'LOD'
<td><a title="Michel Blanc" href="http://www.mysite.com/index.php?title=Michel_Blanc&action=edit&redlink=1">Michel Blanc</a></td>
<td><a title="Pierre Schöller" href="http://www.mysite.com/index.php?title=Pierre_Sch%C3%B6ller&action=edit&redlink=1">Pierre Schöller</a></td>
<td><a title="Focus Features" href="http://www.mysite.com/w/Focus_Features">Focus Features</a><br />
<a title="Olivier Treiner" href="http://www.mysite.com/index.php?title=Olivier_Treadfadfadfiner&action=edit&redlink=1">Olivier Treiner</a>
<td>1600</td>
<a href="http://remove.me.com">remove.me</a>
LOD;
正则表达方式:
$pattern = <<<'LOD'
~
# definitions
(?(DEFINE)
# all the content from the "a" tag begining until the content
# of the "href" attribute
(?<atohref>
<a\b (?> [^h>]++ | \Bh | h(?!ref) )++ href\s*+=\s*+['"]?+
)
# all the content until the closing "a" tag
(?<untilclosea>
(?> [^<]++ | <(?!/a>) )++
)
)
# pattern
\g<atohref>
\Qhttp://www.mysite.com/\E
(?>
\Qindex.php?title=\E
[^>]*+>
( \g<untilclosea> ) # third group (because of the two named groups)
</a>
|
w/ \g<untilclosea>
</a> \K # reset the match (to preserve it)
)
|
<a\b \g<untilclosea> </a> # all other "a" tags
~x
LOD;
$replacement = '$3';
$result = preg_replace($pattern, $replacement, $subject);
echo htmlspecialchars($subject).'<br><br>';
echo htmlspecialchars($result);