Question

我使用以下代码替换HTML页面上的所有链接。

$output = file_get_contents($turl);
$newOutput = str_replace('href="http', 'target="_parent"  href="hhttp://localhost/e/site.php?turl=http', $output);
$newOutput = str_replace('href="www.', 'target="_parent"  href="http://localhost/e/site.php?turl=www.', $newOutput);
$newOutput = str_replace('href="/', 'target="_parent"  href="http://localhost/e/site.php?turl='.$turl.'/', $newOutput);

echo $newOutput;

我想修改此代码，只替换正文中的链接而不是头部中的链接。

Answer 1

你可以斩首代码找到身体并将头部与身体分开为两个变量。

//$output = file_get_contents($turl);

$output = "<head> blablabla

 Bla bla
</head>
<body>
 Foobar
 </body>";

//Decapitation
$head = substr($output, 0, strpos($output, "<body>"));
$body = substr($output, strpos($output, "<body>"));
// Find body tag and parse body and head to each variable

$newOutput = str_replace('href="http', 'target="_parent"  href="hhttp://localhost/e/site.php?turl=http', $body);
$newOutput = str_replace('href="www.', 'target="_parent"  href="http://localhost/e/site.php?turl=www.', $newOutput);
$newOutput =  str_replace('href="/', 'target="_parent"  href="http://localhost/e/site.php?turl='.$turl.'/', $newOutput);

echo $head . $newOutput;

https://3v4l.org/WYcYP

Answer 2

您可以使用DOMDocument来解析和操作源代码。对于像这样的任务使用专用解析器而不是使用字符串操作总是更好的主意。

// Parse the HTML into a document
$dom = new \DOMDocument();
$dom->loadXML($html);

// Loop over all links within the `<body>` element
foreach($dom->getElementsByTagName('body')[0]->getElementsByTagName('a') as $link) {
    // Save the existing link
    $oldLink = $link->getAttribute('href');

    // Set the new target attribute
    $link->setAttribute('target', "_parent");

    // Prefix the link with the new URL
    $link->setAttribute('href', "http://localhost/e/site.php?turl=" . urlencode($oldLink));
}

// Output the result
echo $dom->saveHtml();

请参阅https://eval.in/843484

使用PHP替换html页面正文中的所有链接

2 个答案: