strip_tags禁止使用某些标签

时间:2012-09-11 03:39:54

标签: php html strip-tags

基于strip_tags文档,第二个参数采用允许的标记。但在我的情况下,我想反过来。假设我接受script_tags正常(默认)接受的标记,但只删除<script>标记。任何可行的方法吗?

我不是指有人为我编写代码,而是非常感谢有关如何实现这一目标的可能方式的输入(如果可能的话)。

5 个答案:

答案 0 :(得分:5)

修改

要使用HTML Purifier HTML.ForbiddenElements配置指令,您似乎会执行以下操作:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

http://htmlpurifier.org/docs

HTML.ForbiddenElements should be set to an array。我不知道的是array成员应采取的形式:

array('script','style','applet')

或者:

array('<script>','<style>','<applet>')

或者......还有别的吗?

认为这是第一种形式,没有分隔符; HTML.AllowedElements使用的配置字符串形式与TinyMCE's valid elements syntax有点相似:

tinyMCE.init({
    ...
    valid_elements : "a[href|target=_blank],strong/b,div[align],br",
    ...
});

所以我猜这只是一个术语,并且不应该提供任何属性(因为你禁止元素......虽然也有HTML.ForbiddenAttributes。但这是猜测。

我还会在HTML.ForbiddenAttributes文档中添加此注释:

  

警告:此指令补充了%HTML.ForbiddenElements,   因此,请查看该指令,以便讨论您的原因   在使用这个指令之前应该三思而行。

黑名单不像白名单那样“强大”,但您可能有自己的理由。请注意并小心。

没有测试,我不知道该告诉你什么。我会继续寻找答案,但我可能会先上床睡觉。现在已经很晚了。 :)


虽然我认为你真的应该使用HTML Purifier并使用它的HTML.ForbiddenElements配置指令,但我认为如果你真的真的想要使用strip_tags(),那么合理的选择就是从中导出白名单黑名单。换句话说,删除你不想要的东西,然后使用剩下的东西。

例如:

function blacklistElements($blacklisted = '', &$errors = array()) {
    if ((string)$blacklisted == '') {
        $errors[] = 'Empty string.';
        return array();
    }

    $html5 = array(
        "<menu>","<command>","<summary>","<details>","<meter>","<progress>",
        "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
        "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
        "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
        "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
        "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
        "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
        "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
        "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
        "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
        "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
        "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
        "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
        "<title>","<head>","<html>"
    );

    $list = trim(strtolower($blacklisted));
    $list = preg_replace('/[^a-z ]/i', '', $list);
    $list = '<' . str_replace(' ', '> <', $list) . '>';
    $list = array_map('trim', explode(' ', $list));

    return array_diff($html5, $list);
}

然后运行它:

$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);

if (count($errors)) {
    echo "There were errors.\n";
    print_r($errors);
    echo "\n";
} else {
    // Do strip_tags() ...
}

http://codepad.org/LV8ckRjd

因此,如果您传入了您不想允许的内容,它会以array形式返回HTML5元素列表,然后您可以在将其加入strip_tags()之后将其输入$stripped = strip_tags($html, implode('', $whitelist))); 字符串:

$allowable_tags

警告Emptor

现在,我已经将这种情况整合在一起了,我知道还有一些我还没想过的问题。例如,strip_tags() man pagestrip_tags()参数:

  

注意:

     

此参数不应包含空格。 <看到了一个标签   作为>与第一个空格或strip_tags("<br/>", "<br>")之间不区分大小写的字符串。   这意味着$html5返回一个空字符串。

已经很晚了,出于某种原因,我无法弄清楚这对这种方法意味着什么。所以我明天就要考虑一下。我还在此MDN documentation page函数的<tagName> 元素中编译了HTML元素列表。眼尖的读者可能会注意到所有标签都是这种形式:

{{1}}

我不确定这会对结果产生什么影响,我是否需要考虑使用shorttag <tagName/>的变化以及某些,ahem, odder变种。当然,还有more tags out there

所以它可能不是生产准备好了。但是你明白了。

答案 1 :(得分:2)

首先,看看其他人对此主题的看法:

Strip <script> tags and everything in between with PHP?

remove script tag from HTML content

看来你有两个选择,一个是正则表达式解决方案,上面的链接都给出了它们。第二种是使用HTML Purifier

如果由于除用户内容卫生之外的其他原因剥离脚本标记,则正则表达式可能是一个很好的解决方案。但是,正如每个人都警告的那样,如果要清理输入,最好使用HTML Purifier。

答案 2 :(得分:1)

PHP(5或更高版本)解决方案:

如果您要删除<script>标签(或任何其他标签),也要删除标签内的内容,则应使用:

选项1(最简单):

preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);

选项2(更多样化):

<?php

$html = "<p>Your HTML code</p><script>With malicious code</script>"

$dom = new DOMDocument();

$dom->loadHTML($html);

$script = $dom->getElementsByTagName('script');

$remove = [];
foreach($script as $item)
{
  $item->parentNode->removeChild($item);
}

$html = $dom->saveHTML();

然后$html将是:

"<p>Your HTML code</p>"

答案 3 :(得分:0)

这是我用来删除禁止标签列表的内容,可以同时删除包含内容的标签和包含内容的标签,还可以删除剩余的空白区域。

$description = trim(preg_replace([
    # Strip tags around content
    '/\<(.*)doctype(.*)\>/i',
    '/\<(.*)html(.*)\>/i',
    '/\<(.*)head(.*)\>/i',
    '/\<(.*)body(.*)\>/i',
    # Strip tags and content inside
    '/\<(.*)script(.*)\>(.*)<\/script>/i',
], '', $description));

输入示例:

$description = '<html>
<head>
</head>
<body>
    <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>
    <script type="application/javascript">alert('Hello world');</script>
</body>
</html>';

输出结果:

<p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>

答案 4 :(得分:0)

我使用以下内容:

function strip_tags_with_forbidden_tags($input, $forbidden_tags)
{
    foreach (explode(',', $forbidden_tags) as $tag) {
        $tag = preg_replace(array('/^</', '/>$/'), array('', ''), $tag);
        $input = preg_replace(sprintf('/<%s[^>]*>([^<]+)<\/%s>/', $tag, $tag), '$1', $input);
    }

    return $input;
}

然后你可以这样做:

echo strip_tags_with_forbidden_tags('<cancel>abc</cancel>xpto<p>def></p><g>xyz</g><t>xpto</t>', 'cancel,g');

输出:'abcxpto<p>def></p>xyz<t>xpto</t>'

echo strip_tags_with_forbidden_tags('<cancel>abc</cancel> xpto <p>def></p> <g>xyz</g> <t>xpto</t>', 'cancel,g');

输出:'abc xpto <p>def></p> xyz <t>xpto</t>'