从数据文件中提取数据

时间:2019-04-11 09:27:34

标签: bash awk sed

我有31个文件,我想从它们中提取特定数据,并将其写入一个文本文件或在同一文件中进行编辑。文件示例如下:

Please download 'codg0010.18i.Z' file

数据如下:

   2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
   ...
   45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   59   63   69   76   83   90   96  100  100   93   81   68   55   46   39   34
   31   29   28   28   26   25   24   26   32   40   48   54   56   54   50   46
   43   42   42   44   46   48   51   54   57   59   59   58   55   51   48   47
   48   50   53   56   58   61   63   65   66   66   65   65   66   68   72   76
   82   86   88   87   81   72   64   59   59
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   63   67   74   80   88   97  107  115  116  109   95   79   64   53   45   40
   37   36   36   38   39   39   40   43   48   54   60   63   62   59   54   50
   47   45   45   46   47   49   51   54   57   60   60   59   57   54   53   54
   56   60   62   64   65   67   69   72   74   74   74   73   72   72   74   77
   82   86   90   89   84   77   68   63   63
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   71   75   80   84   90  100  112  123  127  122  108   91   75   64   56   50
   46   45   47   51   54   57   59   61   65   70   72   72   69   64   58   53
   50   48   47   46   46   47   49   52   56   59   61   61   59   58   58   60
   63   66   68   68   68   70   73   77   80   82   82   81   80   79   79   81
   84   89   93   94   91   84   76   72   71
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   82   84   86   87   89   96  108  122  130  128  116  101   87   77   70   63
   58   56   59   64   69   73   76   79   82   84   83   80   74   67   60   55
   53   51   49   47   45   44   47   51   55   59   62   63   63   62   62   64
   67   69   69   68   67   69   74   81   86   89   90   89   88   88   88   89
   92   96  100  103  100   94   87   83   82
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   94   95   94   89   84   86   96  111  122  125  118  108   98   92   85   77
   70   67   69   74   81   86   89   92   93   93   91   85   77   68   61   57
   55   53   51   48   45   44   46   51   56   61   64   66   66   66   66   67
   68   68   67   64   64   68   75   83   90   95   97   98   98   99   99  101
  103  107  112  115  113  108  100   95   94
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
  109  109  104   94   81   75   80   93  107  114  113  109  106  104   98   90
   80   75   76   82   88   93   95   97   98   97   93   86   77   68   61   58
   57   56   55   51   48   47   49   54   59   63   67   69   70   70   68   67
   65   64   61   59   61   67   76   86   94   99  103  105  108  111  114  116
  119  123  128  130  129  123  116  110  109
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
  127  127  121  106   85   69   66   75   88   98  103  106  110  112  108   98
   87   80   80   84   90   93   94   95   95   93   89   82   74   67   61   58
   58   59   59   57   54   52   54   58   63   66   68   71   72   70   67   63
   59   56   54   54   58   67   78   89   97  102  107  111  117  123  129  135
  139  143  147  149  147  141  133  127  127
  ...
  1                                                      END OF TEC MAP

数据以“ START OF TEC MAP”开始,以“ END OF RMS MAP”结束。为了不处理头文件。

sed -n -i '/START OF TEC MAP/,/END OF RMS MAP/p'

我尝试获取每个循环的最后五个值的第二行,它们以45.0-180.0开始,以25.0-180.0结尾。所以应该是这样:

   2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
   ...
   45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   54   56   54   50   46     
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H       
   63   62   59   54   50
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H       
   72   69   64   58   53
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   80   74   67   60   55
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   85   77   68   61   57
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   86   77   68   61   58
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
   82   74   67   61   58
  ...
  1                                                      END OF TEC MAP

正则表达式很复杂,就像我的初学者一样。

  • ^以模式开头
  • [0-9]浮点数?
  • $结尾处\
  • 转义正则表达式字符(LAT / LON1 / LON2 / DLON / H)

1 个答案:

答案 0 :(得分:0)

以下AWK应该可以解决此问题:

awk '
  /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
  $1 == "45.0-180.0" {p=1}
  $1 == "25.0-180.0" {p=0}
  p && $0 ~ "LAT/LON1/LON2/DLON/H" {
    print; getline; getline
    print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
  }
' < FILE

说明:

  • 第一行始终打印与正则表达式/(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/匹配的行。这将包括您要始终包含的所有标题。

  • 接下来的两行根据第一个字段$1的内容,将标志“ p”设置为true或false(1或0)。

  • $0 ~ "LAT/LON1/LON2/DLON/H"允许我匹配AWK正则表达式中的/字符,有关该语法的更多信息,请参见this。添加p && $0 ~ "LAT/LON1/LON2/DLON/H" { ... }表示如果p为true并且整行与模式匹配,则在块{ ... }中执行步骤。

  • 在块内,我打印该行,然后调用getline两次以读取另外2行。

  • 然后打印相对于AWK特殊的$NF变量的第5个,第4个,第3个,第2个和最后一个字段,这将为您提供字段数。

测试:

▶ awk '
    /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
    $1 == "45.0-180.0" {p=1}
    $1 == "25.0-180.0" {p=0}
    p && $0 ~ "LAT/LON1/LON2/DLON/H" {
      print; getline; getline
      print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
    }
  ' < c1pg0010.18i

我得到:

     1                                                      START OF TEC MAP
  2018     1     1     0     0     0                        EPOCH OF CURRENT MAP
    45.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
44 45 44 43 42
    42.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
53 54 54 52 51
    40.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
61 62 61 59 57
    37.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
67 66 65 63 61
    35.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
71 70 68 67 65
    32.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
75 72 71 71 69
    30.0-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
79 76 75 76 75
    27.5-180.0 180.0   5.0 450.0                            LAT/LON1/LON2/DLON/H
85 82 81 82 83
     1                                                      END OF TEC MAP
     2                                                      START OF TEC MAP
...

如果您随后需要处理FTP目录中所有31个文件的代码,请在此Bash代码中包装AWK:

for f in *.Z ; do
  gunzip $f
  decompressed=${f%.Z}
  awk '
    /(START|END|EPOCH) OF (TEC|RMS|CURRENT) MAP/
    $1 == "45.0-180.0" {p=1}
    $1 == "25.0-180.0" {p=0}
    p && $0 ~ "LAT/LON1/LON2/DLON/H" {
      print; getline; getline
      print $(NF-4)" "$(NF-3)" "$(NF-2)" "$(NF-1)" "$NF
    }
  ' < $decompressed > $decompressed.edited
done

我假设您是从包含所有扩展名为.Z的数据文件的目录中运行此脚本的。