Cygwin按日期排序文本

时间:2012-04-30 17:38:01

标签: unix sorting cygwin timestamp

将这个问题归结为25年前我所知道的事情,并忘了......

我有来自Windows事件日志的日志输出,并且无法控制时间戳格式(如果我这样做,我会选择像YYYYMMDD HH24MMSS那样合理的东西,这样当它被视为字符串时它很容易排序。
我确信使用sed或某种排序参数可以轻松实现此目的。有没有人有这方面的快速解决方案?

示例数据:

SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO
SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO

所需订单:

SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO

重新格式化时间戳是正常的,甚至是可取的。我只是不知道如何。 这需要在Windows上运行,我可以使用Cygwin(我已经在同一个文件上使用它进行grep过滤)。

5 个答案:

答案 0 :(得分:2)

这是一个Perl脚本,它为每行添加了一个可排序的时间戳:

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
    my $timestamp = (split /,/)[1];
    my($mon, $mday, $year, $hour, $min, $sec, $ampm) =
        $timestamp =~ m{^(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+)\s+(AM|PM)};
    die "Can't parse timestamp on line $.\n" if not defined $ampm;
    if ($ampm eq 'AM') {
        $hour = 0 if $hour == 12;
    }
    else {
        $hour += 12 if $hour != 12;
    }

    printf "%04d-%02d-%02d %02d:%02d:%02d,%s",
           $year, $mday, $mon, $hour, $min, $sec, $_;
}

对于样本数据,它会产生以下输出:

2013-01-01 00:00:01,SERVER01,1/1/2013 12:00:01 AM,8,FOO,TOO
2012-10-04 16:43:06,SERVER01,4/10/2012 4:43:06 PM,8,FOO,TOO
2012-11-04 16:43:06,SERVER01,4/11/2012 4:43:06 PM,8,FOO,TOO
2012-09-04 16:43:06,SERVER01,4/9/2012 4:43:06 PM,8,FOO,TOO
2012-31-12 23:59:59,SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
2012-10-04 16:43:06,SERVER02,4/10/2012 4:43:06 PM,8,FOO,TOO
2012-09-04 16:43:06,SERVER02,4/9/2012 4:43:06 PM,8,FOO,TOO

要按日期对样本数据进行排序,假设上述Perl脚本为foo.pl

./foo.pl in.txt | sort | sed 's/^[^,]*,//'

这会产生与您问题中指定输出相同的输出。

如果您愿意,可以对Perl脚本进行一些小调整,以避免sortsed命令,但代价是在整个内存中存储,修改和排序,这可能是输入非常大的问题:

#!/usr/bin/perl

use strict;
use warnings;

my @lines = ();

while (<>) {
    my $timestamp = (split /,/)[1];
    my($mon, $mday, $year, $hour, $min, $sec, $ampm) =
        $timestamp =~ m{^(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+)\s+(AM|PM)};
    die "Can't parse timestamp on line $.\n" if not defined $ampm;
    if ($ampm eq 'AM') {
        $hour = 0 if $hour == 12;
    }
    else {
        $hour += 12 if $hour != 12;
    }

    push @lines, sprintf "%04d-%02d-%02d %02d:%02d:%02d,%s",
                         $year, $mday, $mon, $hour, $min, $sec, $_;
}

@lines = sort @lines;
foreach (@lines) {
    s/^[^,]*,//;
}
print @lines;

答案 1 :(得分:2)

awk绝对带有cygwin,它可以将日期转换为可排序的格式到行的前面(我要退出新手到awk,所以这很难看我确定但是它有效),所以那么您可以将日志记录到此脚本中,然后进入简单的排序

#! /bin/awk -f
BEGIN {
   FS=",";
}
{
   linedate=$2;
   split(linedate,datetime," ");
   split(datetime[1],datepieces,"/");
   date=sprintf( "%d/%02d/%02d", datepieces[3], datepieces[1], datepieces[2]);
   split(datetime[2],timepieces,":");
   time=sprintf( "%02d:%02d:%02d", timepieces[1], timepieces[2], timepieces[3] );
   print date " " time " " datetime[3] "," $1 "," $3 "," $4 "," $5;
}

答案 2 :(得分:1)

我不得不做这样的事情 - 我有多个日志文件需要合并log4j时间戳。

我决定采用的解决方案是使用gawk将时间戳转换为毫秒 - 从纪元开始,并为所有行添加前缀。之后使用sort很简单。

我转换为以上格式,因为我还想对t9imestamp值进行一些算术运算。您可以使用快捷方式转换为yymmddXhhmmss中的sedX用于am/pm 0 am 1 pm

进一步考虑,使用gawk而不是sed也会更好,这样您就可以使用printf获得零填充数字。

答案 3 :(得分:1)

试试这个unix命令。

我只为时间戳部分做过。

输入

1/1/2013 12:00:01 AM
4/10/2012 4:43:06 PM
4/9/2012 4:43:06 PM
12/31/2012 11:59:59 PM
4/10/2012 4:43:06 PM
4/9/2012 4:43:06 PM

Unix命令

$>sort -t "/"  -k 1.8,1.4 Input| sort -t ":" -r -k 1 -k 2.1,2.2 -k 3.1,3.2 | sort -t " " -r -k 3.1

<强>输出

4/9/2012 4:43:06 PM
4/9/2012 4:43:06 PM
4/10/2012 4:43:06 PM
4/10/2012 4:43:06 PM
12/31/2012 11:59:59 PM
1/1/2013 12:00:01 AM

您可以根据自己的要求修改脚本。

答案 4 :(得分:1)

克里斯,

您可能需要使用下面提供的代码,特别是查看sort命令。

我编写的awk脚本清理Windows Server 2003“timestamp”,以便用0填充单个数字。可以很容易地改变生成的理智时间戳的格式。

应该使用默认的cygwin安装。

让我知道您的想法,可能需要一些推文,我很乐意根据您的反馈做。

罗布

$ gawk -f foo.awk event_log.txt | sort  -n -k2
SERVER01,04/09/2012 04:43:06 PM,8,FOO,TOO
SERVER01,04/10/2012 04:43:06 PM,8,FOO,TOO
SERVER01,04/11/2012 04:43:06 PM,8,FOO,TOO
SERVER02,04/09/2012 04:43:06 PM,8,FOO,TOO
SERVER02,04/10/2012 04:43:06 PM,8,FOO,TOO
SERVER02,12/31/2012 11:59:59 PM,8,FOO,TOO
SERVER01,01/01/2013 12:00:01 AM,8,FOO,TOO

foo.awk在哪里

BEGIN { FS = "," }
{ print $1"," prepadDate($2) "," $3 "," $4 "," $5 }

#
# Returnes a useful timestamp given the timestamp received in event logs on Windows Server 2003
#
function prepadDate(winSrvr2003ts) {

        padded_day = ""
        padded_month = ""
        year = ""

        padded_date = ""
        split(winSrvr2003ts,numbers," ")

        split(numbers[1], date, "/")
        split(numbers[2], time, ":")
        antePostMeridian = numbers[3]

        padded_day = prePadAZero(date[1])
        padded_month = prePadAZero(date[2])
        year = date[3]
        padded_hour = prePadAZero(time[1])
        minute = time[2]
        seconds = time[3]

        #
        # Alter the return statememnt to format the timestamp according to your needs
        # rememebering that string concatenation in gawk is simply a space.
        #
        return padded_day "/" padded_month "/" year " " padded_hour ":" minute ":" seconds " " antePostMeridian
}

#
# Prepend a zero to number if it is a single digit
#
function prePadAZero(number){

        if (length(number) == 1)
                padded = "0" number
        else
                padded = number

        return padded
}