Question

我需要一个c＃方法来编码＆符号（如果它们尚未编码或是另一个编码的epxression的一部分）

例如

"tom & jill" should become "tom &amp; jill"


"tom &amp; jill" should remain "tom &amp; jill"


"tom &euro; jill" should remain "tom &euro; jill"


"tom <&> jill" should become "tom <&amp;> jill"


"tom &quot;&&quot; jill" should become "tom &quot;&amp;&quot; jill"

Answer 1

您真正想要做的是，首先解码字符串，然后再次编码。不要试图修补编码的字符串。

如果可以轻松解码，任何编码都是值得的，所以重用这种逻辑可以让您的生活更轻松。而且你的软件不易出错。

现在，如果你不确定字符串是否被编码 - 问题肯定不是字符串本身，而是生成字符串的生态系统。你是从哪里得到的？在它到达你之前是谁通过了它？你相信吗？

如果你真的不得不求助于创建一个魔法修复奇怪的数据函数，那么考虑建立一个“编码”表及其相应的字符：

&amp; -> &
&euro; -> €
&lt; -> <
// etc.

然后，首先解码根据表格所有遇到的编码，然后重新编码整个字符串。当然，你可能会在没有首先解码的情况下获得更有效的方法。但明年你不会理智。这是你的载体，对吗？你需要保持正确的头脑！如果你想要太聪明，你会失去理智。当你发疯的时候，你会失去工作。对于那些让他们的黑客破坏思想的人来说，悲伤的事情发生了......

编辑：当然，使用.NET库可以避免疯狂：

HttpUtility.HtmlDecode(string)
HttpUtility.HtmlEncode（字符串）

我刚试过它，它似乎没有问题解码字符串只有＆符号。所以，继续：

string magic(string encodedOrNot)
{
    var decoded = HttpUtility.HtmlDecode(encodedOrNot);
    return HttpUtility.HtmlEncode(decoded);
}

编辑＃2 ：事实证明，解码器HttpUtility.HtmlDecode可以用于您的目的，但编码器不会，因为您不需要尖括号（{{1 }}，<）要编码。但编写编码器非常简单：

Answer 2

这应该做得很好：

text = Regex.Replace(text, @"
    # Match & that is not part of an HTML entity.
    &                  # Match literal &.
    (?!                # But only if it is NOT...
      \w+;             # an alphanumeric entity,
    | \#[0-9]+;        # or a decimal entity,
    | \#x[0-9A-F]+;    # or a hexadecimal entity.
    )                  # End negative lookahead.", 
    "&amp;",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

Answer 3

使用正则表达式可以使用否定lookahead完成。

&(?![^& ]+;)

测试示例here

如果＆符号尚未编码，如何对其进行编码？

3 个答案: