从字符串创建嵌套列表

时间:2020-12-21 09:04:41

标签: python regex list nested-lists

这是一串区域位置,它们是新加坡各自的子区域。

Bishan[1]
Bishan East
Marymount
Upper Thomson
Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)
Alexandra Hill
Alexandra North
Bukit Ho Swee
Bukit Merah (Not to be confused with Bukit Merah planning area.)
City Terminals (Formerly called "Tanjong Pagar" subzone.)
Depot Road
Everton Park
Henderson Hill
Kampong Tiong Bahru
Maritime Square (Formerly called "HarbourFront" subzone.)
Redhill
Singapore General Hospital
Telok Blangah Drive
Telok Blangah Rise
Telok Blangah Way
Tiong Bahru
Tiong Bahru Station
Bukit Timah[3]
Anak Bukit
Coronation Road
Farrer Court
Hillcrest
Holland Road
Leedon Park
Swiss Club
Ulu Pandan
Downtown Core[4]
Anson
Bayfront
Bugis
Cecil
Central
City Hall
Clifford Pier
Marina Centre
Maxwell
Phillip
Raffles Place
Tanjong Pagar
Geylang[5]
Aljunied
Geylang East
Kallang Way
MacPherson
Kampong Ubi
Kallang[6]
Bendemeer
Boon Keng
Crawford
Geylang Bahru
Kallang Bahru
Kampong Bugis
Kampong Java
Lavender
Tanjong Rhu

或者,作为 Python 字符串:

data = 'Bishan[1]\nBishan East\nMarymount\nUpper Thomson\nBukit Merah[2] (Not to be confused with Bukit Merah subzone.)\nAlexandra Hill\nAlexandra North\nBukit Ho Swee\nBukit Merah (Not to be confused with Bukit Merah planning area.)\nCity Terminals (Formerly called "Tanjong Pagar" subzone.)\nDepot Road\nEverton Park\nHenderson Hill\nKampong Tiong Bahru\nMaritime Square (Formerly called "HarbourFront" subzone.)\nRedhill\nSingapore General Hospital\nTelok Blangah Drive\nTelok Blangah Rise\nTelok Blangah Way\nTiong Bahru\nTiong Bahru Station\nBukit Timah[3]\nAnak Bukit\nCoronation Road\nFarrer Court\nHillcrest\nHolland Road\nLeedon Park\nSwiss Club\nUlu Pandan\nDowntown Core[4]\nAnson\nBayfront\nBugis\nCecil\nCentral\nCity Hall\nClifford Pier\nMarina Centre\nMaxwell\nPhillip\nRaffles Place\nTanjong Pagar\nGeylang[5]\nAljunied\nGeylang East\nKallang Way\nMacPherson\nKampong Ubi\nKallang[6]\nBendemeer\nBoon Keng\nCrawford\nGeylang Bahru\nKallang Bahru\nKampong Bugis\nKampong Java\nLavender\nTanjong Rhu\n'

带有 square brackets[] 的单词是区域后跟由换行符 \n 分隔的子区域。我想要做的是创建一个区域列表,其中包含一个子区域的子列表,如下所示(稍后我将要删除方括号和括号及其内容):

1.) 碧山[1]

- Bishan East
- Marymount
- Upper Thomson

2.) Bukit Merah[2](不要与 Bukit Merah 分区混淆。)

- Alexandra Hill
- Alexandra North
- Bukit Ho Swee
- Bukit Merah (Not to be confused with Bukit Merah planning area.)
- City Terminals (Formerly called "Tanjong Pagar" subzone.)

...

到目前为止,我只能使用 split() 和正则表达式提取区域。

zones_and_subzones = data.split('\n')
zones = [zone for zone in zones_and_subzones if re.match(r'(.*?)\[', zone)]

这就是我被困的地方,我在尝试提取每个区域的子区域时遇到了麻烦。我尝试使用

regex = (\].*?\[)

提取右方括号和左方括号之间的文本,但其结果不完整。我在这方面已经有一段时间了,非常感谢您的帮助。如果有比我目前拥有的更好的方法,请分享。谢谢。

2 个答案:

答案 0 :(得分:0)

在您执行时将其拆分为换行符,然后逐行检查并确定每一行是“标题”还是“内容”。使用字典按标题访问内容。

s = your data
result = {}
for item in s.splitlines():
    if '[' in item:
        key = item
        result[key] = []
    else:
        result[key].append(item)

结果是像 {'Bishan[1]': ['Bishan East', 'Marymount', 'Upper Thomson'], ...} 这样的字典。

答案 1 :(得分:0)

在这种情况下更建议使用字典,特别是我会使用默认字典来更快地实现:

from collections import defaultdict 
dicti = defaultdict(lambda:[])
for word in str_data.split('\n'):
    if '[' in word and ']' in word:
        name = word
    else:
        dicti[name].append(word) # or alternatively -> `dicti[name] += [word]`
>>>dicti
{'Bishan[1]': ['Bishan East', 'Marymount', 'Upper Thomson'],
             'Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)': ['Alexandra Hill',
              'Alexandra North',
              'Bukit Ho Swee',
              'Bukit Merah (Not to be confused with Bukit Merah planning area.)',
              'City Terminals (Formerly called "Tanjong Pagar" subzone.)',
              'Depot Road',
              'Everton Park',
              'Henderson Hill',
              'Kampong Tiong Bahru',
              'Maritime Square (Formerly called "HarbourFront" subzone.)',
              'Redhill',
              'Singapore General Hospital',
              'Telok Blangah Drive',
              'Telok Blangah Rise',
              'Telok Blangah Way',
              'Tiong Bahru',
              'Tiong Bahru Station'],
   #...
})
相关问题