我有许多需要提取和格式化数据的日志文件。其中一些日志文件非常大,超过10,000行。
任何人都可以建议使用代码示例来帮助我阅读文本文件,删除不需要的行,然后将剩余行编辑为特定格式。我无法找到任何以前拥有我所追求的线程。
我需要编辑的数据示例如下:
136: add student 50000000 35011 / Y01T :Unknown id in field 3 - ignoring line
137: add student 50000000 5031 / Y01S :Unknown id in field 3 - ignoring line
138: add student 50000000 881 / Y01S :Unknown course idnumber in field 4 - ignoring line
139: add student 50000000 5732 / Y01S :Unknown id in field 3 - ignoring line
134: add student 50000000 W250 / Y02S :OK
135: add student 50000000 35033 / Y01T :OK
我需要搜索文件并删除任何带有后缀的行:OK。 然后,我需要将其余行编辑为CSV格式,例如:
add,student,50000000,1234 / abcd
任何提示,代码片段等都会非常有用,我将非常感激。在问之前我先尝试一下,但我没有时间自学python文件访问/字符串格式。因此,请允许我提前道歉,因为在提出要求之前没有尝试
答案 0 :(得分:0)
这可能是一个解决方案:
import sys
if len(sys.argv) != 2:
print 'Add an input file as parameter'
sys.exit(1)
print 'opening file: %s' % sys.argv[1]
with open(sys.argv[1]) as input, open('output', 'w+') as output:
for line in input:
if line is not None:
if line == '\n':
pass
elif 'OK' in line:
pass
else:
new_line = line.split(' ', 7)
output.write('%s,%s,%s,%s / %s\n' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6]))
# just for checking purposes let's print the lines
print '%s,%s,%s,%s / %s' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6])
注意输出文件名!
答案 1 :(得分:0)
如果它们不同,您可以更改正则表达式以满足您的需要,如果您需要其他分隔符,还可以修改csv.writer的参数:
import re, csv
regex = re.compile(r"(\d+)\s*:\s*(\w+)\s+(\w+)\s+(\w+)\s+([\w/ ]+?):\s*(.+)")
with open("out.csv", "w") as outfile:
writer = csv.writer(outfile, delimiter=',', quotechar='"')
with open("log.txt") as f:
for line in f:
m = regex.match(line)
if m and m.group(6) != "OK":
writer.writerow(m.groups()[1:-1])
答案 2 :(得分:0)
感谢帮助人员。作为一个新手,我最终得到的代码并不那么优雅,但它仍然可以完成工作:)。
#open the file and create the CSV after filtering the input file.
def openFile(filename, keyword): #defines the function to open the file. User to pass two variables.
list = []
string = ''
f = open(filename, 'r') #opens the file as a read and places it into the variable 'f'.
for line in f: #for each line in 'f'.
if keyword in line: #check to see if the keyword is in the line.
list.append(line) #add the line to the list.
print(list) #test.
for each in list: #filter and clean the info, format the info into a CSV format.
choppy = each.partition(': ') #split to remove the prefix.
chunk = choppy[2] #take the good string.
choppy = chunk.partition(' :') #split to remove the suffix.
chunk = choppy[0] #take the good string.
strsplit = chunk.split(' ') #split the string by spaces ' '.
line = strsplit[0] + ',' + strsplit[1] + ',' + strsplit[2] + ',' + strsplit[3] + ' ' + strsplit[4] + ' ' + strsplit[5] + '\n' #concatenate the strings.
string = string + line #concatenate each line to create a single string.
print(string) #test.
f = open(keyword + '.csv', 'w') #open a file to write.
f.write(string) #write the string to the file.
f.close() #close the file.
openFile('russtest.txt', 'cat')
openFile('CRON ENROL LOG 200913.txt', 'field 4')
谢谢:)。