Question

import csv
import urllib.request
from pylab import *

eventurl = "http://data.hisparc.nl/show/source/eventtime/501/2017/1/1/"
data = urllib.request.urlopen(eventurl)
print(data.read())

当我运行代码并打印出来自eventurl的数据时，这就是终端显示的内容：

b"# HiSPARC eventtime histogram source\n#\n# Station: 501\n# Data from 2017-1-1\n#\n# HiSPARC data is licensed under Creative Commons Attribution-ShareAlike 4.0.\n#\n#\n# Please note: the 'bin' is the left bin edge. The width of the bin is 1\n# hour.  So bin 0 means between 0:00 and 1:00. Value means the number of\n# events which were measured during 1 hour.\n#\n# This data contains the following columns:\n#\n# bin:   time [hour of day]\n# value: number of events [counts]\n#\n#\n# bin\tvalue\n0\t2265\n1\t2354\n2\t2302\n3\t2353\n4\t2369\n5\t2378\n6\t2280\n7\t2411\n8\t2340\n9\t2431\n10\t2353\n11\t2394\n12\t2412\n13\t2470\n14\t2404\n15\t2540\n16\t2492\n17\t2390\n18\t2454\n19\t2404\n20\t2451\n21\t2467\n22\t2471\n23\t2371\n\n"

我们想拥有的只是两行的箱子和值。是否有一个代码可以帮助我们摆脱其余部分，并且可以很容易地将数字用于直方图？

Answer 1

data = data.read().decode()
values = data.split("# bin  value")

result = [d.split("\t")[1] for d in values[1].strip().split("\n")]
print(result)

输出：

['2265', '2354', '2302', '2353', ...., '2471', '2371']

Answer 2

使用Python正则表达式re模块

import csv
import urllib.request
import re
l = []
eventurl = "http://data.hisparc.nl/show/source/eventtime/501/2017/1/1/"
data = urllib.request.urlopen(eventurl)
for line in data.readlines():
    line = str(line,'utf-8').strip()
    if re.search(r'\d+\t\d+', line):
        l.append(line.split()[1])

print (l)

输出：

['2265', '2354', '2302', '2353', '2369', '2378', '2280', '2411', '2340', '2431', '2353', '2394', '2412', '2470', '2404', '2540', '2492', '2390', '2454', '2404', '2451', '2467', '2471', '2371']

从urllib url中删除某些行

2 个答案: