通过模式匹配从上一行和下一行中提取列

时间:2015-08-11 18:05:10

标签: regex awk

我有以下文件:extract_info.txt

 ABC
 PNG
 CHNS

和to_extractfrom.txt,我需要从中检索信息:

 ABC  123 234 TCHSL
 NBV  234 23764 DHG
 CHNS 123 347 CGJKS
 CVS  233 4747 JSHGD
 PNG  122 324 HGH
 SJDH 373 3487 JHG

我正在运行以下代码

 while read line
 do
  gene=$(echo $line | awk -F' ' '{print $1}')
  app1=$(awk -v comp1="$gene" '(comp1==$1) {print $1 }' to_extractfrom.txt)
 done < extract_info.txt

然而,我想要的输出是从文件to_extractfrom.txt中提取extract_info.txt中列的信息,这样我就得到了模式匹配行右边和下一行的前一行的第一列即对于第一个文件中的列,我将输出为:

NBV ABC -
SJDH PNG CVS
CVS CHNS NBV

1 个答案:

答案 0 :(得分:3)

awk '
  BEGIN         {prev = "-"}
  NR == FNR     {extract[$1] = 1; next}
  is_match      {print $1, m1, m2; is_match = 0}
  $1 in extract {is_match = 1; m1 = $1; m2 = prev}
  {prev = $1}
' extract_info.txt to_extractfrom.txt 
NBV ABC -
CVS CHNS NBV
SJDH PNG CVS

如果您的输出必须与extract_info文件的顺序相同,并且您使用GNU awk,则可以

gawk '
    BEGIN         {prev = "-"}
    NR == FNR     {extract[$1] = FNR; next}
    is_match      {output[m1] = $1 FS m1 FS m2; is_match = 0}
    $1 in extract {is_match = 1; m1 = $1; m2 = prev}
    {prev = $1}
    END {
        PROCINFO["sorted_in"] = "@val_num_asc"
        for (key in extract) print output[key]
    }
' extract_info.txt to_extractfrom.txt 
NBV ABC -
SJDH PNG CVS
CVS CHNS NBV