正则表达式提取名称

时间:2018-02-09 19:56:23

标签: regex algorithm regular-language

我有这种形式的字符串:

"""00.000000 00.000000; X-XX000-0000-0; France; Paris; Street 12a;   
00.000000 00.000000; X-XX000-0000-0; Spain; Barcelona; Street 123;"""

我希望在字符串上方获取特定数据towns。我如何获得这些数据?

3 个答案:

答案 0 :(得分:1)

假设Python(三个引号 - 字符串):

string = """00.000000 00.000000; X-XX000-0000-0; France; Paris; Street 12a;   
00.000000 00.000000; X-XX000-0000-0; Spain; Barcelona; Street 123;"""

towns = [part[3] for line in string.split("\n") for part in [line.split("; ")]]
print(towns)

哪个收益

['Paris', 'Barcelona']

真的不需要regex

答案 1 :(得分:1)

如果您只想获得给定示例的城市,可以使用positive lookahead

\b[^;]+(?=;[^;]+;$)

<强>解释

\b        # Word boundary
[^;]+     # Match NOT ; one or more times
(?=       # Positive lookahead that asserts what follows is
   ;      # Match semicolon
   [^;]+  # Match NOT ; one or more times  
   ;      # Match ;
   $      # Match end of the string
)         # Close lookahead

答案 2 :(得分:0)

如果您在第4个字段中拥有该城市,则可以使用此模式进行匹配:

 /(?:[^;]*;){3}([^;]*);/

See the demo

[^;]*;你找到一个由非分号组成并以分号结尾的字段

(?:...){3}你发现它3次,但你没有抓住它

([^;]*);然后你得到第4列匹配其内容(不是分号)