Question

我想将一个字符串拆分成每个代表每个字段的字符串：名称，城市，点数，分数，卡片

我有这些字符串：

Paul Grid - Hong Kong 56  663 0
Anna Grid - Tokyo 16  363 0
Greg H.Johs - Hong Kong -6  363 4
Jessy Holm Smith - Jakarta 8  261 0

格式为：

Name[SPACE]-[SPACE]City[SPACE]-Points[SPACE][SPACE]Score[SPACE]Cards

名称可以包含空格和'。'在其中
城市可以有空格
ex Score和Points之间有时会有双倍空格
分数，积分，卡可以是负数

我想在Python中实现的规则如下：

Name : From beginning, until you see "-" - and then strip trailing space from that string.
Cards: From end and back, until you meet the first space
Score: From the space you hit when you made card, go back until next space.
Points:From the space you hit when you made Score, go back until next space.
City : where Name ended and where the Points stopped after seeing the space.

我的问题在于我不能将空格替换为分隔符，因为空间可以在名称和城市中，并且“ - ”用于分隔名称和城市。

我可以用蛮力的方式做到这一点，并逐步完成每个角色的角色，但是想知道Python是否有这样做的聪明方法？

我的最终结果是将每一行拆分成字段，所以我可以解决ex scorerecord.name，scorerecord.city等。

Answer 1

使用re.match()函数和特定的正则表达式模式：

import re

data = '''Paul Grid - Hong Kong 56  663 0
Anna Grid - Tokyo 16  363 0
Greg H.Johs - Hong Kong -6  363 4
Jessy Holm Smith - Jakarta 8  261 0'''

data = data.split('\n')
pat = re.compile(r'(?P<name>[^-]+) +- *(?P<city>[^0-9]+) +(?P<points>-?[0-9]+) +'\
                   '(?P<score>[0-9]+) +(?P<cards>[0-9]+)')

result = [pat.match(s).groupdict() for s in data]

print(result)

输出：

[{'name': 'Paul Grid', 'city': 'Hong Kong', 'points': '56', 'score': '663', 'cards': '0'}, {'name': 'Anna Grid', 'city': 'Tokyo', 'points': '16', 'score': '363', 'cards': '0'}, {'name': 'Greg H.Johs', 'city': 'Hong Kong', 'points': '-6', 'score': '363', 'cards': '4'}, {'name': 'Jessy Holm Smith', 'city': 'Jakarta', 'points': '8', 'score': '261', 'cards': '0'}]

https://docs.python.org/3/library/re.html#re.match.groupdict

Answer 2

您可以使用正则表达式。我认为这涵盖了你的规则：

import re
r = re.compile(r"([\w. ]+?)\s?-\s?([A-z ]+?)\s+(-?\d+?)\s+?(-?\d+?)\s+?(-?\d+)")
r.match("Paul Grid - Hong Kong 56  663 0").groups()

返回

('Paul Grid', 'Hong Kong', '56', '663', '0')

您可以将其粘贴到https://regex101.com/中，以详细了解其工作原理

Answer 3

正则表达式r'(.*) - (.*) (-?\d+) (\d+) (\d+)'以非常简单的方式执行您所描述的匹配：

lines = '''Paul Grid - Hong Kong 56  663 0
Anna Grid - Tokyo 16  363 0
Greg H.Johs - Hong Kong -6  363 4
Jessy Holm Smith - Jakarta 8  261 0'''.split('\n')

import re
p = re.compile(r'(.*) - (.*) (-?\d+)  (\d+) (\d+)')
for line in lines:
    m = p.match(line)
    print(m.groups())

# ('Paul Grid', 'Hong Kong', '56', '663', '0')
# ('Anna Grid', 'Tokyo', '16', '363', '0')
# ('Greg H.Johs', 'Hong Kong', '-6', '363', '4')
# ('Jessy Holm Smith', 'Jakarta', '8', '261', '0')

Answer 4

你可以像这样分开第一个“ - ”：

name, rest = s.strip().split("-", 1)

你可以将其余部分分成任意数量的空白，只分成三次，这样城市名称中的空格就会被保留下来：

city, points, score, cards = rest.rsplit(None, 3)

现在剩下的就是从名称和城市中删除多余的空格，您可以使用strip（），并将其置于某种结构中：

parts = [name.strip(), city.strip(), points, score, cards]

Answer 5

另一种正则表达式：

echo json_encode([
   "message" => "User Registration Successfully",
   "userId" => $rndm_code
]);
exit();

导致输出：

import re

text = """Paul Grid - Hong Kong 56  663 0
Anna Grid - Tokyo 16  363 0
Greg H.Johs - Hong Kong -6  363 4
Jessy Holm Smith - Jakarta 8  261 0"""


print()
pat = r'^([^-]+) - ?([^-]+?)(?= -?\d+) (-?\d+) +(-?\d+) +(-?\d+)$'

for k in re.findall(pat,text,re.MULTILINE):
    print(k)

说明：

文本部分('Paul Grid', 'Hong Kong', '56', '663', '0') ('Anna Grid', 'Tokyo', '16', '363', '0') ('Greg H.Johs', 'Hong Kong', '-6', '363', '4') ('Jessy Holm Smith', 'Jakarta', '8', '261', '0')使用＆＃34;一个或多个其他任何内容 - ＆＃34;他们之间有'([^-]+) - ?([^-]+?)'。
第二个文字必须遵循' - '：a（可选）'(?= -?\d+)'和数字正向前瞻。
然后使用-捕获数字，再次使用可选符号。所有必须在一行' (-?\d+)'内，并激活多行。

将字符串拆分为子字符串

5 个答案: