Python正则表达式,找到所有带有模式的匹配项

时间:2018-10-11 15:51:09

标签: python regex

我正在尝试使用Slack的API,它发送用户名的字符串,例如:<@ UCH65RHRC>

因此,在API JSON主体的文本中,一行中可能包含上述几种模式,例如:

“嗨,<@ UCH65RHRC>和<@ UCH65RHRF>,感谢您所做的一切!”

如何使用Python的正则表达式查找具有此预定义模式的所有匹配字符串,即:<@ ##########,其中#(共9个)可以是0-9和AZ ?

1 个答案:

答案 0 :(得分:1)

这是非常简单的任务。正则表达式def skill_graph_from_df(self, sx_dataframe, path_of_existing=""): """Builds directed graph from data frame, where the weight of the edges is the confidence, as used in associaton analysis. :param sx_dataframe: Pandas Dataframe - columns: tags, postid, page, alltext. :param path_of_existing: str - path of an existing skill graph in GraphML format. New data is added to this graph. New graph is built if string is empty. :return: void """ self.df_all = sx_dataframe self.pagelist = self.df_all.page.unique() len_df = len(self.df_all) # directed graph with confidence of the rule keyword 1 => keyword 2 as weight for edges (google association analysis for explanation) if path_of_existing is not "": # import GraphML graph self.read_graph(path_of_existing) self.keywords_di.graph['pages'] = self.keywords_di.graph['pages'] + ", " + ", ".join(self.pagelist) else: self.keywords_di.graph['pages'] = ", ".join(self.pagelist) for i in range(len_df): taglist = nltk.word_tokenize(self.df_all.iloc[i, 0]) pairs = findsubsets(taglist, 2) # pairs of keywords for word in taglist: # adds nodes if word in self.keywords_di.nodes: self.keywords_di.nodes[word]['count'] += 1 else: self.keywords_di.add_node(word, count=1) for pair in pairs: # adds edges if pair in self.keywords_di.edges: self.keywords_di.edges[pair]['paircount'] += 1 self.keywords_di.edges[pair[::-1]]['paircount'] += 1 else: self.keywords_di.add_edge(*pair, paircount=1) self.keywords_di.add_edge(*pair[::-1], paircount=1) for node in self.keywords_di: for edge in self.keywords_di.out_edges([node]): self.keywords_di.edges[edge]['confidence'] = self.keywords_di.edges[edge]['paircount'] / self.keywords_di.nodes[node]['count'] 应该符合您的要求。例如:

<@([0-9A-Z]{9})>

这将提供以下输出:

import re

body = "Hi <@UCH65RHRC> and <@UCH65RHRF>, thanks for all the great work!"
id_search = re.findall("<@([0-9A-Z]{9})>", body)

for id in id_search:
    print(id)