如何将ruby正则表达式更改为python正则表达式

时间:2015-12-30 05:04:03

标签: python regex

下面的代码是一个ruby表达式。我想将它转换为python代码。我该怎么办?

add_zzim\(\'(.*?)\',\'(.*?)\',\'(?<param>.*?)\',.*

来源:

<li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li>

2 个答案:

答案 0 :(得分:1)

import re
p = re.compile(ur'add_zzim\(\'(.*?)\',\'(.*?)\',\'(.*?)\',.*')
test_str = u"<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');\" title=\"contents.\">14913089</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');\" title=\"contents.\">14913012</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');\" title=\"contents.\">14913045</li>\n"

for i in re.findall(p, test_str):
    print(i[2])

这会给你列表,然后你可以将第3个元素作为'param'

答案 1 :(得分:0)

这是一种非正则表达式方法。

要提取onclick属性值,我们将使用BeautifulSoup HTML解析器;提取add_zzim()参数值 - ast.literal_eval()

完整的工作示例:

from ast import literal_eval

from bs4 import BeautifulSoup

data = """
<ul>
    <li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li>
    <li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li>
    <li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li>
</ul>
"""

soup = BeautifulSoup(data, "html.parser")

for li in soup.select("li.num"):
    args = literal_eval(li["onclick"].replace("add_zzim", "").rstrip(";"))
    print(args)

打印:

('BD_AD_08', '14913089', 'helloooo', '3586312774', 'test')
('BD_AD_08', '14913012', 'helloooo', '3586312774', 'test')
('BD_AD_08', '14913041', 'helloooo', '3586312774', 'test')