Question

我有一个长字符串，其中包含文字Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,

我想在Python 3.6中用'EUR'替换'\ xe2 \ x82 \ xac'

如果我打印字符串，我看到它前面是b，即它是字节文字。

 b'<div dir="ltr"><br ...' etc.

我无法对其进行编码（html = html.encode('UTF-8')），因为我得到a bytes-like object is required, not 'str'也无法对其进行解码（'str' object has no attribute 'decode'）

我试过了：

html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

这些都不起作用。

html.decode("utf-8")给我一个错误'str' object has no attribute 'decode'。

对于上下文，通过使用邮箱库读取电子邮件的内容来生成字符串：

for message in mbox:
   for part in message.walk():
       html = str(part.get_payload(decode=True))

Answer 1

您应该使用：

html = html.replace(r"\xe2\x82\xac", "EUR")

这样字符串\xe2\x82\xac就会被替换为EUR。假设\确实在你的html上。

否则，你应该

html = html.replace('\u20ac', 'EUR')

但事实并非如此，因为使用unicode符号时，它不起作用。

不要认为Python在字符串中使用UTF-8（实际上它不在内部使用UTF-8）。

注意：Python使用UTF-16（或UTF-32），因此Python（从解码的字符串）永远不会编写\xe2\x82\xac。所以或\是文字的，或者某些输出过程会损坏它。

Answer 2

import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: test@email.com


[image: map]

[image: Uber logo]
â¬17.50
Thanks for choosing Uber,

Answer 3

它不起作用。

html="Your Sunday evening order with Uber Eats\nTo: test@email.com\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

html = html.encode("utf-8",'strict');

print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))

替换字符串中的特殊字符不起作用

3 个答案: