使用awk将单行日志转换为正确格式

时间:2019-04-25 08:50:51

标签: awk sed

我在字符串中有一个程序日志值(整个日志都在单行中),我想转换成多行,awk一定会这样做,但是如何在单行中循环?

我在bash中有下面的代码(其中str仅包含一行,由程序生成的整个日志记录字符串)

str="2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0) 2019/04/24 23:26:42 - START - Starting job entry 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File] 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0) 2019/04/24 23:26:42 - Call_Param_File - Starting job entry 
 - blah blah blah..."
echo $str|awk 'BEGIN { ORS=" \n "}; { printf "%s %s %s", $1,$2,$3}'

上述awk命令将执行的操作是打印日志文本的初始三个值,并用“-”分隔。但这必须循环执行,因为我期望输出如下,其中包含日期或时间戳记和短消息,后跟长消息字符串。

2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0) 
2019/04/24 23:26:42 - START - Starting job entry 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File] 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0) 
2019/04/24 23:26:42 - Call_Param_File - Starting job entry - blah blah blah...

我们如何使用awk做到这一点?

str="2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0) 2019/04/24 23:26:42 - START - Starting job entry 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File] 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0) 2019/04/24 23:26:42 - Call_Param_File - Starting job entry 
 - blah blah blah..."
echo $str|awk 'BEGIN { ORS=" \n "}; { printf "%s %s %s", $1,$2,$3}'

预期的最终结果是:-

2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0) 
2019/04/24 23:26:42 - START - Starting job entry 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File] 
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0) 
2019/04/24 23:26:42 - Call_Param_File - Starting job entry - blah blah blah...

4 个答案:

答案 0 :(得分:0)

请您尝试以下操作(仅通过提供的示例进行测试)。

echo "$str" | awk '{val=$1;$1="";gsub(/[0-9]+\/[0-9]+\/[0-9]+/,ORS "&");print val $0}'

编辑: 也在此处添加@Corentin的注释版本:

echo $str | awk '{print gensub(/.([0-9\/]{10})/, "\n\\1", "g")}'

答案 1 :(得分:0)

在gnu awk上尝试

awk -vRS='([0-9]{2,4}/?){3}' '{printf $0"\n"RT}' <<<$str

在gnu sed上尝试

 sed -E 's/([0-9]{2,4}\/?){3}/\n&/g'<<<$str

答案 2 :(得分:0)

因为它是四月,并且是一个bash字符串,所以bash替换符可能就足够了:

echo "${str// 2019/$'\n'2019}"

输出:

2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0)
2019/04/24 23:26:42 - START - Starting job entry
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File]
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0)
2019/04/24 23:26:42 - Call_Param_File - Starting job entry

注意:由于bash的字符串替换功能不及sedawk那样,因此该代码在除夕夜将失败,因为该替换操作会丢失以行开头的行与2020/01/01。如果日志行中不包含字符串“ 20 ”(请注意前导空格),则在接下来的80年内可能会很好:

echo "${str// 20/$'\n'20}"

答案 3 :(得分:0)

输入以下内容:

$ str='2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0) 2019/04/24 23:26:42 - START - Starting job entry 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File] 2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0) 2019/04/24 23:26:42 - Call_Param_File - Starting job entry - blah blah blah...'

使用GNU awk进行多字符RS和RT:

$ echo "$str" | awk -v RS='[0-9/]{10} [0-9:]{8} |\n' 'NR>1{print p $0} {p=RT}'
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Start of job execution
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(0, 0, START.0)
2019/04/24 23:26:42 - START - Starting job entry
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - Starting entry [Call_Param_File]
2019/04/24 23:26:42 - Main_Cons_Job_edw_cc_sf_accts_assets_feed - exec(1, 0, Call_Param_File.0)
2019/04/24 23:26:42 - Call_Param_File - Starting job entry - blah blah blah...