读取日志文件并将json转储到1行

时间:2017-08-02 09:53:46

标签: python json logging

我有一个如下所示的日志文件:

lgProps

文件当然有更多的行,例如交替单行和具有多行json的行。

我想要实现的是拥有一个读取文件的文件,每当有一个包含json的行时,就会自动转储到一行。

所以它会像:

>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {
  "event" : "sxxxxxdd",
  "ts" : "2017-xx
  "svc" : "dxx.tlc-1",
  "rexxxt" : {
    "ts" : "2017-xxxx2:00",
    "xx" : "73478c0f-dc70-46b7-a388-d12f7b8aa91e",
    "xxxx" : "/xxx/xxx",
    "xxx" : "POST",
    "user_agent" : "xxx/6.2.1 xxxx/7.38.0 xxx/7.0xx16-1~xxx+8.1",
    "user_id" : 39,
    "xxx_ip" : "xxxx.1",
    "xxxx" : "xxxxx",
    "xx" : "xx",
    "app_id" : "d4da4385a8204be2949ed62323231443",
    "axxe" : "POxxkout"
  },
  "operation" : {
    "scxe" : "checkout",
    "rxxxlt" : {
      "xxxus" : 2x0
    }
  },
  "xx" : {
    "xxx_id" : "CHTO06MLKXP9N",
    "xxx_attributes" : {
      "xx" : "2017xx6+02:00",
      "date_xxxxx" : "2xx7-08xx53:06+02:00",
      "xus" : "WAxING",
      "dexxion" : "numx0",
      "chaxxmount" : 2,
      "chaxx_start" : "20x8xx+02:00",
      "charge_max_count" : 1,
      "merchant" : {
        "xxx" : "xxxx",
        "xxx" : "xxxxxxx",
        "xx" : "xx-x xxxxxl.",
        "logo" : "httxxxff0/258xxxjpeg",
        "account_type" : "B"
      },
      "xx_xxx" : "xxxx",
      "xxxx_xxx_url" : "https://xxx.xxx.xxx-pay.xx/xxx",
      "xxx" : "xxxx",
      "xxx" : "xx://dp.xx/uxx10/xxxx"
    }
  },
  "cxx" : "xxxx"
}

我已经尝试过使用python,以下内容:

infile =" /hoxxxx/application.log"

important = [] keep_phrases =" LOG_EVENT"

>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {the json here in 1 line}

但这是回归线,但当然它并不了解json完成的地方......

有任何帮助吗? 感谢

1 个答案:

答案 0 :(得分:0)

你可以尝试使用正则表达式和json

正则表达式:

import re

with open(infile) as f:
    text = f.read()

print re.sub(r'\n([^>])', r'\1', text)

输出:

>>> 2017-08-02 08:51:45 +0200 [INFO] from com.sun.metro.assembler in application-siaServiceImplPort-context-362552 - MASM0007: No application metro.xml configuration file found.
>>> 2017-08-02 08:53:06 +0200 [INFO] from application in application-akka.actor.default-dispatcher-362046 - LOG_EVENT: {"event" : "sxxxxxdd","ts" : "2017-xx"svc" : "dxx.tlc-1","rexxxt" : {"ts" : "2017-xxxx2:00","xx" : "73478c0f-dc70-46b7-a388-d12f7b8aa91e","xxxx" : "/xxx/xxx","xxx" : "POST","user_agent" : "xxx/6.2.1 xxxx/7.38.0 xxx/7.0xx16-1~xxx+8.1","user_id" : 39,"xxx_ip" : "xxxx.1","xxxx" : "xxxxx","xx" : "xx","app_id" : "d4da4385a8204be2949ed62323231443","axxe" : "POxxkout"},"operation" : {"scxe" : "checkout","rxxxlt" : {"xxxus" : 2x0}},"xx" : {"xxx_id" : "CHTO06MLKXP9N","xxx_attributes" : {"xx" : "2017xx6+02:00","date_xxxxx" : "2xx7-08xx53:06+02:00","xus" : "WAxING","dexxion" : "numx0","chaxxmount" : 2,"chaxx_start" : "20x8xx+02:00","charge_max_count" : 1,"merchant" : {"xxx" : "xxxx","xxx" : "xxxxxxx","xx" : "xx-x xxxxxl.","logo" : "httxxxff0/258xxxjpeg","account_type" : "B"},"xx_xxx" : "xxxx","xxxx_xxx_url" : "https://xxx.xxx.xxx-pay.xx/xxx","xxx" : "xxxx","xxx" : "xx://dp.xx/uxx10/xxxx"}},"cxx" : "xxxx"}

如果你想将jsons作为python对象,你也可以这样做:

import json

text2 = re.sub(r'\n([^>])', r'\1', text)
js = [json.loads(x) for x in re.findall(r'{.*}', text2)]