Question

我正在尝试使用re两次来搜索和拆分数据例如：

我正在找到[]中的所有子字符串

我正在尝试分割空间

 private Action[] _functions;

 public void MainEntryPoint()
 {
     _functions = new Action[] { StartTrialWithFixedValue1, StartTrialWithFixedValue2, StartTrialWithRandomValue };
     List<int> trialMarkers = new List<int>() { 1, 1, 2, 2, 3 };
     DoThings(trialMarkers);
 }

 public void DoThings(IEnumerable<int> indexesOfFuctions)
 {
     foreach (var index in indexesOfFuctions)
     {
         _functions[index-1]();
     }
 }

 private void StartTrialWithFixedValue1()
 {
     Trace.WriteLine("StartTrialWithFixedValue1");
 }

 private void StartTrialWithFixedValue2()
 {
     Trace.WriteLine("StartTrialWithFixedValue2");
 }

 private void StartTrialWithRandomValue()
 {
     Trace.WriteLine("StartTrialWithRandomValue");
 }

我的代码是：

[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"

但这给了我一个错误-不允许我重复使用

任何建议都会很棒！预先谢谢

Answer 1

>>> sum([date.split() for date in re.findall(r'\[(.*?)\]', file)], [])
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

或使用itertools.chain

>>> from itertools import chain
>>> list(chain(*re.findall(r'\[(\S+) (\S+)\]', file)))
['2018-07-10', '15:04:11', '2018-07-10', '15:04:12', '2018-07-10', '15:04:42', '2018-07-10', '15:04:42']

Answer 2

更新您的正则表达式以第一次捕获每个组，完全不需要 split ：

re.findall(r'\[(.*?)\s(.*?)\]', s)

[('2018-07-10', '15:04:11'),
 ('2018-07-10', '15:04:12'),
 ('2018-07-10', '15:04:42'),
 ('2018-07-10', '15:04:42')]

如果您需要将其作为扁平化列表：

[elem for grp in re.findall(r'\[(.*?)\s(.*?)\]', s) for elem in grp]

['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

Answer 3

import re

data = """[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"
"""

new_data = []
re.sub(r'\[(.*?)\].*', lambda g: new_data.extend(g[1].split()), data)
print(','.join(new_data))

输出：

2018-07-10,15:04:11,2018-07-10,15:04:12,2018-07-10,15:04:42,2018-07-10,15:04:42

Answer 4

使用re.findall()和.split()，因为不需要两次使用正则表达式。

import re
a = '''[2018-07-10 15:04:11] USER INPUT "hello"
[2018-07-10 15:04:12] SYSTEM RESPONSE: "Hello! How are you doing today"
[2018-07-10 15:04:42] USER INPUT "I am doing good thank you"
[2018-07-10 15:04:42] SYSTEM RESPONSE: "Good to know"'''


[item for sublist in [n.split() for n in re.findall(r'\[(.*?)\]',a)] for item in sublist]
['2018-07-10',
 '15:04:11',
 '2018-07-10',
 '15:04:12',
 '2018-07-10',
 '15:04:42',
 '2018-07-10',
 '15:04:42']

Answer 5

您的file变量具有re.findall的元素列表

尝试：

import re

file = re.findall(r'\[(.*?)\]', file)
m = [re.split(r'\ +', i) for i in file]
print(m)

输出：

[['2018-07-10', '15:04:11'], ['2018-07-10', '15:04:12'], ['2018-07-10', '15:04:42'], ['2018-07-10', '15:04:42']]

两次使用正则表达式

5 个答案: