我正在尝试编写正则表达式来解析我的日志文件。它们看起来像这样:
I, [2018-03-23T13:30:10.076546 #3107] INFO -- : method='HEAD' path='/healthcheck' format='*/*' ip= status=200 duration=0.03
I, [2018-03-23T13:31:23.488928 #3107] INFO -- : method='GET' path='/feed/bc822bc19.csv' format= ip='127.0.0.0' status=200 duration=0.04 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:31:30.956484 #3107] INFO -- : method='GET' path='/feed/ad4d93bee.csv' format= ip='127.0.0.0' status=200 duration=0.05 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:32:10.123399 #3107] INFO -- : method='HEAD' path='/healthcheck' format='*/*' ip= status=200 duration=0.03 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:33:46.362908 #3107] INFO -- : method='GET' path='/feed/e9cbe2f42e0a6.xml' format= ip='127.0.0.0' status=200 duration=0.02 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:34:10.060682 #3107] INFO -- : method='HEAD' path='/healthcheck' format='*/*' ip= status=200 duration=0.03 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:35:01.445029 #3107] INFO -- : method='GET' path='/feed/85b91d6f7.xml' format= ip='127.0.0.0' status=200 duration=0.02 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:35:04.486874 #3107] INFO -- : method='GET' path='/feed/34bda5b6f.csv' format= ip='127.0.0.0' status=200 duration=0.33 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:35:04.609879 #3107] INFO -- : method='GET' path='/feed/0b4dbb477.xml' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:35:07.441873 #3107] INFO -- : method='GET' path='/feed/4b494e658.xml' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:35:34.640805 #3107] INFO -- : method='GET' path='/feed/dbde9d8c5.xml' format= ip='127.0.0.0' status=200 duration=0.02 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:36:09.232026 #3107] INFO -- : method='HEAD' path='/healthcheck' format='*/*' ip= status=200 duration=0.03 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:36:11.494500 #3107] INFO -- : method='GET' path='/feed/d42267d54.xml' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:38:09.878287 #3107] INFO -- : method='HEAD' path='/healthcheck' format='*/*' ip= status=200 duration=0.01 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:38:32.595255 #3107] INFO -- : method='GET' path='/feed/4b9badc64.csv' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:38:34.941950 #3107] INFO -- : method='GET' path='/feed/212ddc50f.csv' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:38:36.658162 #3107] INFO -- : method='GET' path='/feed/34bcd9d0e.csv' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:38:38.223703 #3107] INFO -- : method='GET' path='/feed/fe286b188.csv' format= ip='127.0.0.0' status=200 duration=0.00 host='feeds' user='-' params={} agent='' protocol='http'
I, [2018-03-23T13:56:29.026273 #3107] INFO -- : method='GET' path='/feed/c1684e144.csv' format='text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' ip='127.0.0.0' status=200 duration=0.49 host='feeds' user='-' params={} agent='Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0' protocol='http'
我正在尝试解析它以获取以下字段:
timestamp, method, path, format, ip, status, duration, host, user, params, agent and protocol.
我几乎有0个正则表达式知识,所以这个任务很难。我一直试图写一些东西但是......根本没有真正做到这一点。
这是我的尝试:
"no-clue-what-to-write + method=%{WORD:message_method}[]+path=%{WORD:message_path}[]+format=%{WORD:message_format}[]+ip=%{WORD:message_ip}[]+status=%{BASE10NUM:message_status_integer}[ ]+duration=%{BASE10NUM:message_duration_float}[ ]+host=%{WORD:message_host}[]+.*user=%{USERDASH:message_user}[ ]+ip=%{IP:message_ip}[ ]+params=%{WORD:message_params}[]+agent=%{WORD:message_agent}[]+protocol=%{WORD:message_protocol}[]+"
我怎么能把它写成实际工作?
我想在这里测试一下:http://grokconstructor.appspot.com/do/match。这甚至可以吗?
答案 0 :(得分:1)
您的时间戳采用 ISO8601 格式,可与%{TIMESTAMP_ISO8601}
匹配,WORD
。
我使用预定义模式匹配其余字段,或者?
。由于某些字段为空白,因此我使用I, \[%{TIMESTAMP_ISO8601} %{DATA} method='%{WORD:method}' path='%{URIPATH:path}' format='(?:%{DATA:format})?' ip='(?:%{IP:ip})?' status=%{INT:status} duration=%{NUMBER:duration:float} host='(?:%{WORD:host})?' user='(?:%{USERNAME})?' params=%{DATA:params} agent='(?:%{DATA:agent})?' protocol='%{URIPROTO}'
运算符来表示"前一个标记的零次或一次出现"
此自定义grok模式应该可以使用并匹配您提供的任何日志模式
{
"TIMESTAMP_ISO8601": [
[
"2018-03-23T13:31:30.956484"
]
],
"YEAR": [
[
"2018"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"23"
]
],
"HOUR": [
[
"13",
null
]
],
"MINUTE": [
[
"31",
null
]
],
"SECOND": [
[
"30.956484"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"DATA": [
[
"#3107] INFO -- :"
]
],
"method": [
[
"GET"
]
],
"path": [
[
"/feed/ad4d93bee.csv"
]
],
"format": [
[
"a"
]
],
"ip": [
[
"127.0.0.0"
]
],
"IPV6": [
[
null
]
],
"IPV4": [
[
"127.0.0.0"
]
],
"status": [
[
"200"
]
],
"BASE10NUM": [
[
"0.05"
]
],
"host": [
[
"feeds"
]
],
"USERNAME": [
[
"-"
]
],
"params": [
[
"{}"
]
],
"agent": [
[
"saddas"
]
],
"URIPROTO": [
[
"http"
]
]
}
这是在pre-defined grok pattern中测试的输出,
import requests
# Data
data = {
'data1':'something',
'data2':'otherthing'
}
# Custom headers
headers = {
'content-type': 'multipart/form-data'
}
# Get response from server
response = requests.post('http://localhost/', data=data, headers=headers)
# If you care about the response
print(response.json())
希望它有所帮助。