替换字符串中的特殊字符不起作用

时间:2018-04-05 13:10:49

标签: python python-3.x encoding

我有一个长字符串,其中包含文字Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,

我想在Python 3.6中用'EUR'替换'\ xe2 \ x82 \ xac'

如果我打印字符串,我看到它前面是b,即它是字节文字。

 b'<div dir="ltr"><br ...' etc.

我无法对其进行编码(html = html.encode('UTF-8')),因为我得到a bytes-like object is required, not 'str'也无法对其进行解码('str' object has no attribute 'decode'

我试过了:

html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

这些都不起作用。

html.decode("utf-8")给我一个错误'str' object has no attribute 'decode'

对于上下文,通过使用邮箱库读取电子邮件的内容来生成字符串:

for message in mbox:
   for part in message.walk():
       html = str(part.get_payload(decode=True))

3 个答案:

答案 0 :(得分:2)

您应该使用:

html = html.replace(r"\xe2\x82\xac", "EUR")

这样字符串\xe2\x82\xac就会被替换为EUR。假设\确实在你的html上。

否则,你应该

html = html.replace('\u20ac', 'EUR')

但事实并非如此,因为使用unicode符号时,它不起作用。

不要认为Python在字符串中使用UTF-8(实际上它不在内部使用UTF-8)。

注意:Python使用UTF-16(或UTF-32),因此Python(从解码的字符串)永远不会编写\xe2\x82\xac。所以或\是文字的,或者某些输出过程会损坏它。

答案 1 :(得分:1)

import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: test@email.com


[image: map]

[image: Uber logo]
â¬17.50
Thanks for choosing Uber,

答案 2 :(得分:0)

它不起作用。

html="Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

html = html.encode("utf-8",'strict');

print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))