将时间从字母数字转换为数字

时间:2018-09-25 17:53:50

标签: awk

我有一个文本文件:

ifile.txt
x       y       z       t              value
1       1       5       01hr01Jan2018   3
1       1       5       02hr01Jan2018   3.1
1       1       5       03hr01Jan2018   3.2
1       3.4     3       01hr01Jan2018   4.1
1       3.4     3       02hr01Jan2018   6.1
1       3.4     3       03hr01Jan2018   1.1
1       4.2     6       01hr01Jan2018   6.33
1       4.2     6       02hr01Jan2018   8.33
1       4.2     6       03hr01Jan2018   5.33
3.4     1       2       01hr01Jan2018   3.5
3.4     1       2       02hr01Jan2018   5.65
3.4     1       2       03hr01Jan2018   3.66
3.4     3.4     4       01hr01Jan2018   6.32
3.4     3.4     4       02hr01Jan2018   9.32
3.4     3.4     4       03hr01Jan2018   12.32
3.4     4.2     8.1     01hr01Jan2018   7.43
3.4     4.2     8.1     02hr01Jan2018   7.93
3.4     4.2     8.1     03hr01Jan2018   5.43
4.2     1       3.4     01hr01Jan2018   6.12
4.2     1       3.4     02hr01Jan2018   7.15
4.2     1       3.4     03hr01Jan2018   9.12
4.2     3.4     5.5     01hr01Jan2018   2.2
4.2     3.4     5.5     02hr01Jan2018   3.42
4.2     3.4     5.5     03hr01Jan2018   3.21
4.2     4.2     6.2     01hr01Jan2018   1.3
4.2     4.2     6.2     02hr01Jan2018   3.4
4.2     4.2     6.2     03hr01Jan2018   1

说明:每个坐标(x,y)都有一个z值和三个时间值。空格不是制表符。它们是空格序列。

我想将t列从字母数字格式设置为数字格式,然后转换为csv文件。我的预期输出是:

ofile.txt
x,y,z,201801010100,201801010200,201801010300
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1
The desire time format is replaced with YYYYMMDDHHMin. 

我之前曾问过这个问题的一部分。请参阅Format and then convert txt to csv using shell script and awk。但是,我无法在以下脚本中更改时间格式。

awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
   printf "%s", rn[1]
   for(i=1; i<=h; i++)
      printf "%s", OFS hn[i]
   print ""
   for (i=2; i<=n; i++)
      print rn[i], row[rn[i]]
}' ifile.txt

1 个答案:

答案 0 :(得分:3)

扩大我对上一个问题的回答:

gawk '
    BEGIN {
        SUBSEP = OFS = ","
        month["Jan"] = "01"; month["Feb"] = "02"; month["Mar"] = "03";
        month["Apr"] = "04"; month["May"] = "05"; month["Jun"] = "06";
        month["Jul"] = "07"; month["Aug"] = "08"; month["Sep"] = "09";
        month["Oct"] = "10"; month["Nov"] = "11"; month["Dec"] = "12"; 
    }
    function timestamp_to_numeric(s) {
        # 03hr31Jan2001 => 200101310300
        return substr(s,10,4) month[substr(s,7,3)] substr(s,5,2) substr(s,1,2) "00"
    }
    NR==1 {next}
    {g = timestamp_to_numeric($4); groups[g]; value[$1,$2,$3][g] = $5}
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        printf "x,y,z"; for (g in groups) printf ",%s", g; printf "\n"
        for (a in value) {
            printf "%s", a
            for (g in groups) printf "%s%s", OFS, 0+value[a][g]
            printf "\n"
        }
    }
' ifile.txt
x,y,z,201801010100,201801010200,201801010300
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

您必须在月份名称和月份编号之间创建一个映射,然后创建一个函数来将时间戳转换为新格式。除此之外,代码是相同的。