如何从Python2.x中的unicode字符串中删除转义字符(转义unicode字符)?

时间:2017-06-25 03:03:16

标签: python python-2.7 unicode

>>> test
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> print test
"Hello," he said‏‎.
        "I am nine years oldâ"
>>> print test2
"Hello," he\u200b said\u200f\u200e.
        "I\u200b am\u200b nine years old"

那么我如何从test2转换为test(即打印unicode字符)? .decode('utf-8')没有做到。

1 个答案:

答案 0 :(得分:3)

您可以使用unicode-escape encoding'\\u200b'解码为u'\u200b'

>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> test2.decode('unicode-escape')
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"'
>>> print test2.decode('unicode-escape')
"Hello," he​ said‏‎.
    "I​ am​ nine years old"

注意:但即使这样,test2也无法解码为与test1完全匹配,因为u'\xe2'中的test1就在结束引号之前(" })。

>>> test1 == test2.decode('unicode-escape')
False
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape')
True