我怎样才能按奇怪的日期格式排序?

时间:2015-12-29 12:59:28

标签: bash perl sorting awk sed

我有一个这样的日志文件:

December 20, 2015, 11:00pm
November 18, 2014, 12:00am
October 05, 2012, 11:30pm
October 02, 2012, 5:30pm
October 01, 2012, 12:30am
October 01, 2010, 11:30am
October 01, 2011, 9:30pm
October 01, 2011, 7:30am
...

我可以使用sort这样简单的日期格式:

Mar  4 07:45
Mar  8 06:45
Mar  8 05:45

sort -k1M -k2 -k3 text.txt

Mar  4 07:45
Mar  8 05:45
Mar  8 06:45

但是我不能对我的日志文件使用sort。我该怎么办?如何使用sortawk或其他?

执行此操作

6 个答案:

答案 0 :(得分:3)

您可以使用Bash工具将日期转换为时间戳,添加此信息,排序并将其删除:

while IFS=, read -r day year hour; do
   printf "%s %s, %s, %s\n" "$(date -d"$day $year $hour" +"%s")" "$day" "$year" "$hour"
done < file  | sort -n | cut -d' ' -f2-

这假设格式位于day, year, hour格式。

一步一步

让我们将日期转换为时间戳:

while IFS=, read -r day year hour;
do
printf "%s %s, %s, %s\n" "$(date -d"$day $year $hour" +"%s")" "$day" "$year" "$hour"
done < a                            
1450648800 December 20,  2015,  11:00pm
1416265200 November 18,  2014,  12:00am
1349472600 October 05,  2012,  11:30pm
1349191800 October 02,  2012,  5:30pm
1349044200 October 01,  2012,  12:30am
1285925400 October 01,  2010,  11:30am
1317497400 October 01,  2011,  9:30pm

让我们排序:

while IFS=, read -r day year hour;
do
printf "%s %s, %s, %s\n" "$(date -d"$day $year $hour" +"%s")" "$day" "$year" "$hour"
done < a  | sort -n                 
1285925400 October 01,  2010,  11:30am
1317497400 October 01,  2011,  9:30pm
1349044200 October 01,  2012,  12:30am
1349191800 October 02,  2012,  5:30pm
1349472600 October 05,  2012,  11:30pm
1416265200 November 18,  2014,  12:00am
1450648800 December 20,  2015,  11:00pm

让我们删除临时时间戳:

$ while IFS=, read -r day year hour;
do
printf "%s %s, %s, %s\n" "$(date -d"$day $year $hour" +"%s")" "$day" "$year" "$hour"
done < a  | sort -n | cut -d' ' -f2-
October 01,  2010,  11:30am
October 01,  2011,  9:30pm
October 01,  2012,  12:30am
October 02,  2012,  5:30pm
October 05,  2012,  11:30pm
November 18,  2014,  12:00am
December 20,  2015,  11:00pm

答案 1 :(得分:3)

只需使用awk从每个输入行创建一个YYYYMMDDHHMM字符串,并将其添加到每行输出,然后管道进行排序,然后剪切以删除awk前面的字符串:

$ cat tst.awk
BEGIN { FS="(,? +|:)" }
{
    mthAbbr = substr($1,1,3)
    mthNr = (match("JanFebMarAprMayJunJulAugSepOctNovDec",mthAbbr)+2)/3
    ampm = $NF; sub(/.*[0-9]/,"",ampm)
    hour = $4 + ( (ampm=="pm") && ($4<12) ? 12 : 0 )
    printf "%04d%02d%02d%02d%02d\t%s\n", $3, mthNr, $2, hour, $5, $0
}

$ awk -f tst.awk file | sort | cut -f2-
October 01, 2010, 11:30am
October 01, 2011, 7:30am
October 01, 2011, 9:30pm
October 01, 2012, 12:30am
October 02, 2012, 5:30pm
October 05, 2012, 11:30pm
November 18, 2014, 12:00am
December 20, 2015, 11:00pm

为了帮助您了解正在发生的事情,以下是中间步骤:

$ awk -f tst.awk file
201512202300    December 20, 2015, 11:00pm
201411181200    November 18, 2014, 12:00am
201210052330    October 05, 2012, 11:30pm
201210021730    October 02, 2012, 5:30pm
201210011230    October 01, 2012, 12:30am
201010011130    October 01, 2010, 11:30am
201110012130    October 01, 2011, 9:30pm
201110010730    October 01, 2011, 7:30am

$ awk -f tst.awk file | sort
201010011130    October 01, 2010, 11:30am
201110010730    October 01, 2011, 7:30am
201110012130    October 01, 2011, 9:30pm
201210011230    October 01, 2012, 12:30am
201210021730    October 02, 2012, 5:30pm
201210052330    October 05, 2012, 11:30pm
201411181200    November 18, 2014, 12:00am
201512202300    December 20, 2015, 11:00pm

答案 2 :(得分:2)

我记得我已经发布了类似问题的答案。然而,搜索后我找不到它。

因此,想法是计算1970-01-01之后的秒数,并将前缀作为前缀添加到原始行,然后排序,最后删除前缀字段。

<include 
    id="@+id/nav_view"
    layout="@layout/Nav_header_main"/>

awk -v cmd='date -d"%s" +%s' '{o=$0;gsub(/,/,"");cc=sprintf(cmd,$0,"%s"); cc|getline d close(cc);print d"\x99"o}' file|sort -n|sed 's/.*\x99//' 是一个不可见的字符,只是为了确保它不会与文件中的现有字符冲突。

输入示例的输出:

\x99

答案 3 :(得分:2)

另一种类似的方法,使用Perl

perl -MTime::Piece -lpe '$_ = Time::Piece->strptime($_, "%B %d, %Y, %l:%M%p")->strftime("%s") . "\t" . $_' file | 
sort -n | 
cut -f2-

答案 4 :(得分:1)

你仍然可以通过分离复合的

来逐字段地进行
$ sed 's/[ap]m/ &/;s/:/ : /' log \
   | sort -k3,3 -k1,1M -k2,2 -k7 -k4,4n -k6,6 \
   | sed -r 's/ : /:/;s/ ([ap]m)/\1/'

October 01, 2010, 11:30am
October 01, 2011, 7:30am
October 01, 2011, 9:30pm
October 01, 2012, 12:30am
October 02, 2012, 5:30pm
October 05, 2012, 11:30pm
November 18, 2014, 12:00am
December 20, 2015, 11:00pm

更新:感谢罗马人没有0,我们有12&lt; 1&lt; 2&lt; ...为每个meridiem(上午/下午)。修复正在用00替换12并在排序后更改回来。

$ sed 's/[ap]m/ &/;s/12:/00:/;s/:/ : /' log \
    | sort -k3,3 -k1,1M -k2,2 -k7 -k4,4n -k6 \
    | sed -r 's/ : /:/;s/ ([ap]m)/\1/;s/00:/12:/' 

October 01, 2010, 11:30am
October 01, 2011, 7:30am
October 01, 2011, 9:30pm
October 01, 2012, 12:30am
October 02, 2012, 5:30pm
October 05, 2012, 11:30pm
November 18, 2014, 12:00am
November 18, 2015, 12:00am
November 18, 2015, 1:00am
November 18, 2015, 12:00pm
November 18, 2015, 1::00pm
December 20, 2015, 11:00pm

PS。现在质疑所选择的日志格式。

答案 5 :(得分:0)

基于Schwartzian transform解决方案的纯Perl:

say $_->[1] for sort {$a->[0] <=> $b->[0]}
map [Time::Piece->strptime($_, "%B %d, %Y, %l:%M%p")->strftime("%s"), $_], @_;

假设数组@_包含日志文件的行。这使用{{3}}。