Bash:从上次登录时开始,从日志文件x小时前提取数据

时间:2013-09-15 10:21:04

标签: bash awk timestamp logfile

我是Bash的新手,我有一个从日志文件中提取数据的任务,具体取决于时间戳。我希望能够从文件中的最新输入开始查看日志文件中最近几小时的输入。我有一些代码,但它不起作用,因为它在日志文件中写入了所有内容。

我的日志文件的一部分如下所示:

213.64.56.208 - - [01/Jan/2003:10:14:34 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:36 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:39 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:42 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:47 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:49 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:52 +0100] "GET
213.64.56.208 - - [01/Jan/2003:10:14:57 +0100] "GET
213.67.145.223 - - [01/Jan/2003:11:00:06 +0100] "HEAD
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:18 +0100] "GET

我的代码应该获取最后一个输入的时间戳,并与其他输入进行比较,但比较似乎不起作用。这是代码:

if [ $h -gt 0 ]
then    
    echo " A specified time is set! "
    TimeInSeconds=$((h*60*60)) # set to seconds instead of hours
    last=$(tail -n1 thttpd.log |awk -F'[][]' '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""| getline d; print d;}')
    awk -F'[][]' -v last=$last -v x=$TimeInSeconds '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; if (last-date<=x)print $1 "[" $2 "]"  }' thttpd.log 

正如我所说,它没有打印正确的时间跨度,我相信这是一个简单的解决方案,但我看不到它。

有没有人看到错误?

3 个答案:

答案 0 :(得分:1)

您的问题与此thread非常相似。我在这里转发解决方案,但只是根据您的要求进行了少量修改。

#!/bin/bash

H=1  ## Hours
LOGFILE=/path/to/logfile.txt

X=$(( H * 60 * 60 )) ## Hours converted to seconds

function get_ts {
    DATE="${1%%\]*}"; DATE="${DATE##*\[}"; DATE=${DATE/:/ }; DATE=${DATE//\// }
    TS=$(date -d "$DATE" '+%s')
}

get_ts "$(tail -n 1 "$LOGFILE")"
LAST=$TS

while read -r LINE; do
    get_ts "$LINE"
    (( (LAST - TS) <= X )) && echo "$LINE"
done < "$LOGFILE"

使用bash script.sh运行此脚本。

示例输出:

213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:18 +0100] "GET

如果你想要,你可以让它接受参数:

#!/bin/bash

H=$1
LOGFILE=$2

...

并运行bash script.sh h logfile,其中h是小时数,logfile是日志文件的路径。

答案 1 :(得分:1)

问题是,在最后一行,getline正在读取变量d,但在比较中您使用的是变量date

答案 2 :(得分:1)

使用GNU awk的时间函数:

$ cat tst.awk
function time2secs(time,        t) {
    split(time,t,/[/:]/)
    t[2] = (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[2])+2)/3
    return mktime(t[3]" "t[2]" "t[1]" "t[4]" "t[5]" "t[6])
}
BEGIN{ FS="[[ ]"; ARGV[ARGC++] = ARGV[ARGC-1]; xs= x * 60 * 60 }
FNR == NR { lasttime = $5; next }
FNR ==  1 { tstamp = time2secs(lasttime) - xs }
time2secs($5) >= tstamp
$
$ awk -v x=1 -f tst.awk file
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:18 +0100] "GET
$
$ awk -v x=2 -f tst.awk file
213.67.145.223 - - [01/Jan/2003:11:00:06 +0100] "HEAD
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:15 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:16 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:17 +0100] "GET
213.46.27.204 - - [01/Jan/2003:12:55:18 +0100] "GET
相关问题