不同字符编码之间的故障安全转换

时间:2009-05-15 15:55:55

标签: c++ c utf-8

我需要将字符串从一种编码(UTF-8)转换为另一种编码。问题是在目标编码中我们没有来自源编码的所有字符,并且libc iconv(3)函数在这种情况下失败。我想要的是能够执行转换,但在输出字符串中,这个有问题的字符被替换为某些符号,比如'?'。

编程语言是C或C ++。

有没有办法解决这个问题?

2 个答案:

答案 0 :(得分:2)

尝试将“// TRANSLIT”或“// IGNORE”附加到目标字符集字符串的末尾。请注意,这只在GNU C库下支持。

来自iconv_open(3)

   //TRANSLIT
          When the string "//TRANSLIT" is appended to tocode, translitera‐
          tion is activated.  This means that when a character  cannot  be
          represented  in the target character set, it can be approximated
          through one or several similarly looking characters.

   //IGNORE
          When the string "//IGNORE" is  appended  to  tocode,  characters
          that  cannot  be represented in the target character set will be
          silently discarded.

或者,当你从iconv(3)获得-EILSEQ时,手动跳过一个字符并在输出中插入一个替换。

答案 1 :(得分:0)

正则表达式基于可翻译的源范围,用于将相应的占位符交换为任何不匹配的字符。