Question

我正致力于从API检索到的文本中获取表情符号的子集。我想做的是用每个表情符号替换它的描述或名称。

我正在使用Python 3.4，我目前的方法是使用这样的unicodedata访问unicode的名称：

nname = unicodedata.name(my_unicode)

我用re.sub代替

re.sub('[\U0001F602-\U0001F64F]', 'new string', str(orig_string))

我已经尝试过re.search，然后访问匹配和替换字符串（不使用正则表达式）但是还没有能够解决这个问题。

有没有办法为re.sub执行的每次替换获得回调？任何其他路线也表示赞赏。

Answer 1

您可以将回调函数传递给re.sub：来自文档：

re.sub(pattern, repl, string, count=0, flags=0)

返回通过替换repl替换字符串中最左边非重叠模式而获得的字符串。如果未找到模式，则返回字符串不变。 repl可以是字符串或函数; [...]如果repl是一个函数，则会针对模式的每个非重叠事件调用它。 该函数采用单个匹配对象参数，并返回替换字符串。

所以只需使用unicodedata.name作为回调：

>>> my_text ="\U0001F602  and all of this \U0001F605"
>>> re.sub('[\U0001F602-\U0001F64F]', lambda m: unicodedata.name(m.group()), my_text)
'FACE WITH TEARS OF JOY  and all of this SMILING FACE WITH OPEN MOUTH AND COLD SWEAT'

Answer 2

您可以将函数作为https://www.google.com/settings/security/lesssecureapps的 repl 参数传递

传递匹配对象并返回您要吐出的内容：

input = 'I am \U0001F604 and not \U0001F613'
re.sub('[\U0001F602-\U0001F64F]', lambda y: unicodedata.name(y.group(0)), input)
# Outputs:
# 'I am SMILING FACE WITH OPEN MOUTH AND SMILING EYES and not FACE WITH COLD SWEAT'

Answer 3

没有那么干净，但有效：

import unicodedata

my_text ="\U0001F602  and all of this \U0001F605"

for char in range(ord("\U0001F602"),ord("\U0001F64F")):
    my_text=my_text.replace(chr(char),unicodedata.name(chr(char),"NOTHING")) 

print(my_text)

结果：面对欢乐的泪水以及所有这些微笑的面孔和开口的冷汗

Answer 4

在Python 3.5+中，有namereplace错误处理程序。你可以用它来同时转换几个表情符号：

>>> import re
>>> my_text ="\U0001F601, \U0001F602, ♥ and all of this \U0001F605"
>>> re.sub('[\U0001F601-\U0001F64F]+',
...        lambda m: m.group().encode('ascii', 'namereplace').decode(), my_text)
'\\N{GRINNING FACE WITH SMILING EYES}, \\N{FACE WITH TEARS OF JOY}, ♥ and all of this \\N{SMILING FACE WITH OPEN MOUTH AND COLD SWEAT}'

有more Unicode characters that are emoji than the regex pattern indicates例如♥ (U+2665 BLACK HEART SUIT)。

用其描述或名称替换表情符号

4 个答案: