python中的等效正则表达式是什么?

时间:2018-12-08 04:04:14

标签: python regex

用PHP编码

<?php
    $str = "CSIR-National Botanical Research Institute, Plant Transgenic Laboratory, U.P., India. Electronic address: i.sanyal@nbri.res.in.";
    preg_match("/([A-Z][^\s,.]+[.]?\s[(]?)*(Hospital|University|Institute|Law School|School of|Academy|College)[^,\d]*(?=,|\d)/", $str, $org_arr);
    echo $org_arr[0];   
?>

输出

  

CSIR-国家植物研究所

此正则表达式从给定的PHP字符串中提取医院,大学,学院,学校,学院或学院。我尝试在python中执行相同的正则表达式,但不起作用。

PYTHON中的代码

import re
line = "CSIR-National Botanical Research Institute, Plant Transgenic Laboratory, U.P., India. Electronic address: i.sanyal@nbri.res.in."
match = re.search(r'/([A-Z][^\s,.]+[.]?\s[(]?)*(Hospital|University|Institute|Law School|School of|Academy|College)[^,\d]*(?=,|\d)/', line)
print(match.group(0))

提供错误消息

  

回溯(最近一次通话最近):文件“ C:\ Users \ Ghost   Rider \ Documents \ Python \ temp.py“,第4行,在       print(match.group(0))AttributeError:'NoneType'对象没有属性'group'

1 个答案:

答案 0 :(得分:0)

编辑:

其他细节。您会在None类型上遇到错误,因为该模式与任何内容都不匹配;比解释起来更容易显示如何检查...

因此,让我们稍微更改一下示例,看看它是否符合您的期望。请注意,图案上没有前后斜杠(请参见下面的原始)。

import re
txt = "CSIR-National Botanical Research Institute, Plant Transgenic Laboratory, U.P., India. Electronic address: i.sanyal@nbri.res.in."
# note: str is the string class type, python would happily let you assign that to a string literal.
print('txt={}'.format(txt))
pattern = r'([A-Z][^\s,.]+[.]?\s[(]?)*(Hospital|University|Institute|Law School|School of|Academy|College)[^,\d]*(?=,|\d)'
m = re.search(pattern, txt)
if m:
    print('found some things, groups={}'.format(m.groups()))
else:
    print('no match')

结果:

txt=CSIR-National Botanical Research Institute, Plant Transgenic Laboratory, U.P., India. Electronic address: i.sanyal@nbri.res.in.
found some things, groups=('Research ', 'Institute')

我认为PHP的 $ org_arr 部分是在Python的 m.groups()列表中设置的。

原始:

也许可以在python中尝试它时不使用斜杠吗? 让我们从制作一个简单的模式开始...

PHP示例

这些PHP docs显示了此示例:

// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}

因为它们只是在 php 上搜索,所以斜线看起来像模式定界符。

python中的相同示例

在Python中会是这样(不是模式是r'php',不是r'/ php /')。

import re
if re.match( r'php', 'PHP is the web scripting language of choice.', re.IGNORECASE):
    print('A match was found.')
else:
    print('A match was not found.')

稍微有用一点的是保留匹配对象,以便您可以使用组...

import re
m = re.match( r'(php)', 'PHP is the web scripting language of choice.', re.IGNORECASE)
if m:
    print('A match was found, group(1)={}'.format(m.group(1)))
else:
    print('A match was not found.')