Question

我有一个名为的文件：

的test.txt

dog;cat;mouse;bird;turtle;# just some animals
dog;cat;mouse;bird;turtle;horse cow # just some animals

我需要帮助分解第二行，以便它看起来像第一行：

dog;cat;mouse;bird;turtle;horse;cow;# just some animals

困难的部分是它没有设定参数，可以在第5个元素之间和“＃”之前插入多少动物。符号。它可能有2个像我在这个例子中显示的那样或10个。

我能够将所有内容分解为二维数组，但不确定如何分割第二个字符串。

with open (file) as f:
    lines = list (f)
    temp = [line.strip ().split (';') for line in lines]

输出：

for i in temp:
    print (i)

['dog', 'cat', 'mouse', 'bird', 'turtle', '# just some animals']
['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse cow # just some animals']

期望输出：

['dog', 'cat', 'mouse', 'bird', 'turtle', '# just some animals']
['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# just some animals']

感谢任何帮助。

- 更新 -

我的实际数据包含以下模式：

10-2-2015;10:02;LOCATION;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text with caps and lower case
10-2-2015;10:02;LOCATION;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case

X代表IP和子网。＆＃39;＃＆＃39;之后的逗号。应该不受影响。

Answer 1

您可以尝试使用正则表达式：

>>> import re
>>> my_expression = r'[a-z]+|#.+'
>>> f = 'dog;cat;mouse;bird;turtle;# just some animals'
>>> s = 'dog;cat;mouse;bird;turtle;horse cow # just some animals'
>>> re.findall(my_expression, f)
['dog', 'cat', 'mouse', 'bird', 'turtle', '# just some animals']
>>> re.findall(my_expression, s)
['dog', 'cat', 'mouse', 'bird', 'turtle', 'horse', 'cow', '# just some animals']

以上内容将查找一个或多个小写字母（[a-z]+）或（|）的一个实例的哈希/井号后跟一个或多个字符（{{1} }）。

有关更新的示例数据：

#.+

此表达式查找任何哈希/井符号后跟一个或多个字符（>>> my_expression = r'#.+|[^ ;]+' >>> f='10-2-2015;10:02;LOCATION;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1 # more alphanumeric text with caps and lower case' >>> s='10-2-2015;10:02;LOCATION;xxx.xxx.xxx.xxx;xxx.xxx.xxx.xxx;somename1; somename2 somename3 # more,alphanumeric,text,with,caps,and,lower,case' >>> my_expression = r'#.+|[^ ;]+' >>> re.findall(my_expression, f) ['10-2-2015', '10:02', 'LOCATION', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', '# more alphanumeric text with caps and lower case'] >>> re.findall(my_expression, s) ['10-2-2015', '10:02', 'LOCATION', 'xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx', 'somename1', 'somename2', 'somename3', '# more,alphanumeric,text,with,caps,and,lower,case', '\n']）或（#.+）一组一个或多个字符既不是空格也不是分号的字符串（|）。

Answer 2

找到＃的索引并删除之后的所有内容，包括字符串。然后，查找任何空格（和/或任何其他所需字符）并使其成为分号。

如何使用分隔符拆分字符串

2 个答案: