您如何使用正则表达式拆分编号列表

时间:2020-05-20 09:05:44

标签: python regex

我正在尝试将以下格式的大量字符串拆分为python词典列表

1)钱妃宫钱妃宫原名真惠庙贞惠庙,后称为钱灵宫。该庙建于北宋元丰七年(1083年)。该庙在明初被重新装修。 1967年寺庙被拆除,但1985年被重建。主要的神是钱氏圣飞钱氏圣妃。次要神灵是广平周王广平周王和泰山孔王泰山孔王。刘克庄刘克庄在咸淳时期(1265-1127年)所写的题为协应钱夫人庙记(石刻,对钱夫人的援助的记录)的石刻(史料,1995:54,第48号)这个庙宇(不再存在) 2)兴隆社兴隆社:主要的神灵是尊主明王尊主和后土夫人后土夫人。

我尝试了以下操作,但是它也使字符串“ 48)中断了。

re.split(“ \ d +)”,字符串)

结果: 1),48),2)

48)不应该是结果。

我当时正在考虑排除开括号“(”之后的结果,但是我不确定该怎么做。

2 个答案:

答案 0 :(得分:1)

在解析长字符串时,PyPi regex模块被证明可以提供更快,更稳定的性能。

我建议使用pip install regex(或pip3 install regex)安装它,然后运行

import regex
text="1) Qianfeigong 钱妃宫 was originally called the Zhenhuimiao 贞惠庙, and later the Qianlinggong 钱灵宫. The temple was built during the Northern Song in Yuanfeng 7 (1083). The temple was renovated during the early Ming. In 1967 the temple was demolished, but it was rebuilt in 1985. The main god is Qianshi shengfei 钱氏圣妃. Secondary gods are Guangping Zhouwang 广平周王 and Taishan Kongwang 泰山孔王. The stone inscription composed in the Xianchun period (1265–1274) by Liu Kezhuang 刘克庄 entitled 协应钱夫人庙记 (Record of the Temple to Lady Qian of Beneficial Assistance) (Epigraphical Materials, 1995:54, No. 48) is about this temple (stele no longer extant). 2) Xinglongshê 兴隆社: The main gods are Zunzhu mingwang 尊主明王 and Houtu furen 后土夫人."
print(regex.split(r'(?<!\([^()]*)(?!^)(?=\d+\))', text))

请参见Python 3 demo,输出:

['1) Qianfeigong 钱妃宫 was originally called the Zhenhuimiao 贞惠庙, and later the Qianlinggong 钱灵宫. The temple was built during the Northern Song in Yuanfeng 7 (1083). The temple was renovated during the early Ming. In 1967 the temple was demolished, but it was rebuilt in 1985. The main god is Qianshi shengfei 钱氏圣妃. Secondary gods are Guangping Zhouwang 广平周王 and Taishan Kongwang 泰山孔王. The stone inscription composed in the Xianchun period (1265–1274) by Liu Kezhuang 刘克庄 entitled 协应钱夫人庙记 (Record of the Temple to Lady Qian of Beneficial Assistance) (Epigraphical Materials, 1995:54, No. 48) is about this temple (stele no longer extant). ', '2) Xinglongshê 兴隆社: The main gods are Zunzhu mingwang 尊主明王 and Houtu furen 后土夫人.']

详细信息

  • (?<!\([^()]*)-禁止使用(和除当前位置左侧的()以外的任何0+字符
  • (?!^)-目前不允许字符串开始位置
  • (?=\d+\))-当前位置的右边必须有1个以上的数字和)

答案 1 :(得分:1)

尝试此正则表达式:

(?:^|\.\s)\d+\)(?=\s[A-Z])

解释:

(?:^|\.\s)(?#start of line or end of sentence)\d+\)(?#Number followed by bracket)(?=\s[A-Z])(?#whitespace then a captital at the start of the sentence)

Regex101:https://regex101.com/r/Fierhb/1

相关问题