Question

我正在构建一个脚本来从服务器日志中获取数据。数据以下列格式显示，显示时间戳和出现频率。

ggplot(mtcars, aes(x=disp, y=mpg, color=factor(am))) +
   theme_bw() + 
   geom_point() + 
   geom_smooth(method = 'lm', se=FALSE) + 
   geom_abline(aes(intercept=40, slope = (-1/10), fill='Comparison Line 1'), show.legend = TRUE) +
   geom_abline(aes(intercept=25, slope = (-1/30), fill='Comparison Line 2'), show.legend = TRUE)

我正在尝试创建一个仅显示破折号之后的数字的列表：

20:52:37 - 3

20:52:38 - 8

20:52:39 - 28

20:52:40 - 58

20:52:41 - 59

20:52:42 - 51

20:52:43 - 37

20:52:44 - 22

20:52:45 - 4

20:52:47 - 14

20:52:48 - 15

20:52:49 - 12

20:52:50 - 4

20:52:51 - 5

20:52:52 - 12

20:52:53 - 5

我尝试拆分输出，然后只添加所需的元素但仍然遇到错误。尝试拆分破折号和新行代码，然后只需为每个数字添加正确的位置：

[3,8,28,etc.,etc.]

Answer 1

您可以使用re.findall：

import re
s = """
 20:52:37 - 3

 20:52:38 - 8

 20:52:39 - 28

 20:52:40 - 58

 20:52:41 - 59
 ....
 """

data = map(int, re.findall('(?<=\s-\s)\d+', s))

输出：

[3, 8, 28, 58, 59...]

Answer 2

要删除尾随换行符，您可以使用rstrip()：

res = []

with open('server.log') as f:
    lines = (line.rstrip() for line in f)  # to remove trailing newlines
    lines = (line for line in lines if line)  # to remove blank lines
    res = [int(line.split(' - ')[-1]) for line in lines]

<强>输出：

>>> res
[3, 8, 28, 58, 59, 51, 37, 22, 4, 14, 15, 12, 4, 5, 12, 5]

从输出的特定元素列出

2 个答案: