Question

我正在尝试（并且到目前为止失败）从文本行中提取时间和两个测量数据（从文件中读取）

这些行具有以下格式

"2013-08-07-21-25   26.0   1015.81"

我试过（除此之外）：

>>> re.findall(r"([0-9,-]+)|(\d+.\d+)", "2013-08-07-21-25   26.0   1015.81")
[('2013-08-07-21-25', ''), ('26', ''), ('0', ''), ('1015', ''), ('81', '')]

只有结果才有趣（但不是很理想）。

我想找到这样的解决方案：

date, temp, press = re.findall(r"The_right_stuff", "2013-08-07-21-25   26.0   1015.81")
print date + '\n' + temp + '\n' + press + '\n'
2013-08-07-21-25
26.0
1015.81

如果分配可以插入测试以检查匹配数是否正确，那就更好了。

if len(date, temp, press = re.findall(r"The_rigth_stuff", "2013-08-07-21-25   26.0   1015.81")) == 3:
    print 'Got good data.'
    print date + '\n' + temp + '\n' + press + '\n'

线路已经通过串行连接传输，并且可能散布有坏（即意外）字符。所以它不能用字符串索引分开。

请参阅Prevent datetime.strptime from exit in case of format mismatch。

编辑@ hjpotter92

我提到串行传输中存在损坏的线路。以下示例未通过拆分解决方案。

2013-08-1q-07-15   23.8   1014.92
2013-08-11-07-20   23.8   101$96
6113-p8-11-0-25   23.8   1015*04

将测量列表分配到numpy数组失败。

>>> p_arr= np.asfarray(p_list, dtype='float')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 105, in asfarray
    return asarray(a, dtype=dtype)
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
        return array(a, dtype, copy=False, order=order)
    ValueError: invalid literal for float(): 101$96

我放了一组数据here。

Answer 1

使用re.split，因为数据由水平空格字符分隔：

date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")

>>> import re
>>> date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")
>>> print date
2013-08-07-21-25
>>> print temp
26.0
>>> print press
1015.81

Answer 2

print [i+j for i,j in re.findall(r"\b(\d+(?!\.)(?:[,-]\d+)*)\b|\b(\d+\.\d+)\b", "2013-08-07-21-25   26.0   1015.81")]

你必须阻止第一组从第二组中取出任何东西。

输出：['2013-08-07-21-25', '26.0', '1015.81']

使用re.findall从行

2 个答案: