解码西里尔语引用的可打印内容

时间:2018-02-05 02:04:01

标签: c# imap decode

我正在使用this示例从服务器获取邮件。问题是响应包含我无法解码的西里尔符号。 这是一个标题:

Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

并收到回复功能:

static void receiveResponse(string command)
{
    try
    {
        if (command != "")
        {
            if (tcpc.Connected)
            {
                dummy = Encoding.ASCII.GetBytes(command);
                ssl.Write(dummy, 0, dummy.Length);
            }
            else
            {
                throw new ApplicationException("TCP CONNECTION DISCONNECTED");
            }
        }
        ssl.Flush();

        byte[] bigBuffer = new byte[1024*16];
        int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);

        byte[] buffer = new byte[bites];
        Array.Copy(bigBuffer, 0, buffer, 0, bites);

        sb.Append(Encoding.ASCII.GetString(buffer));

        string result = sb.ToString();

        // here is an unsuccessful attempt at decoding
        result = Regex.Replace(result, @"=([0-9a-fA-F]{2})",
            m => m.Groups[1].Success
            ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
            : "");

        byte[] bytes = Encoding.Default.GetBytes(result);
        result = Encoding.GetEncoding("koi8r").GetString(bytes);
    }
    catch (Exception ex)
    {
        throw new ApplicationException(ex.ToString());
    }
}

如何正确解码流?在结果字符串中,我获得<p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p>而不是<p>Привет я Ваня</p>

1 个答案:

答案 0 :(得分:2)

正如@Max指出的那样,您需要使用Content-Transfer-Encoding标头中声明的编码算法对内容进行解码。

在您的情况下,它是quoted-printable编码。

您需要将消息文本解码为字节数组,然后您需要使用适当的System.Text.Encoding将该字节数组转换为字符串。要使用的编码名称通常在Content-Type标头中指定为charset参数(在您的情况下为koi8-r)。

由于您已在缓冲区变量中将文本作为字节,因此只需执行以下操作:

byte[] buffer = new byte[bites];
int decodedLength = 0;

for (int i = 0; i < bites; i++) {
    if (bigBuffer[i] == (byte) '=') {
        if (bites > i + 1) {
            // possible hex sequence
            byte b1 = bigBuffer[i + 1];
            byte b2 = bigBuffer[i + 2];

            if (IsXDigit (b1) && IsXDigit (b2)) {
                // decode
                buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
                i += 2;
            } else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
                // folded line, drop the '=\r\n' sequence
                i += 2;
            } else {
                // error condition, just pass it through
                buffer[decodedLength++] = bigBuffer[i];
            }
        } else {
            // truncated? just pass it through
            buffer[decodedLength++] = bigBuffer[i];
        }
    } else {
        buffer[decodedLength++] = bigBuffer[i];
    }
}

string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);

自定义功能:

static byte ToXDigit (byte c)
{
    if (c >= 0x41) {
        if (c >= 0x61)
            return (byte) (c - (0x61 - 0x0a));

        return (byte) (c - (0x41 - 0x0A));
    }

    return (byte) (c - 0x30);
}

static bool IsXDigit (byte c)
{
    return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
}

当然,您可以使用MimeKitMailKit;而不是编写自己的hodge podge IMAP库。 - )