将行拆分为列

时间:2014-06-18 01:10:00

标签: string awk

在下面的例子中,我想将这些行分成两列,其中第1列是字母串,第2列是" - "之后的数字。标志。

      >1-1112309
      GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT
      >2-787704
      TGAGGTAGTAGGTTGTATAGTT
      >3-736193
      GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC
      >4-671373
      TGTAAACATCCTCGACTGGAAGCT

期望的输出:

           GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT        1112309
           TGAGGTAGTAGGTTGTATAGTT                   787704
           GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC         736193
           TGTAAACATCCTCGACTGGAAGCT                 671373

2 个答案:

答案 0 :(得分:1)

awk -F- '/^>/ {n = $2; next} {printf "%-40s %d\n", $0, n}' file

说明:

-F-      # set field separator to a dash

/^>/     # if line begins with a >
  {n = $2; next}  # then save second field and go on to next line in file

         # empty pattern matches every line (that makes it here)
  {printf "%-40s %d\n", $0, n}   # print current line in 40 columns left-justified
                                 # then print saved number and a newline

答案 1 :(得分:1)

另一个awk命令,

$ awk -v RS="\n>" '{gsub (/\n/," "); gsub (/^.*-/,"",$1); printf "%-40s %d\n", $2,$1}' file
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT        1112309
TGAGGTAGTAGGTTGTATAGTT                   787704
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC         736193
TGTAAACATCCTCGACTGGAAGCT                 671373

RS设置为\n>。因此它根据RS变量中的值(\n>)将输入文件拆分为记录。

gsub (/\n/," ") # Replaces all the newlines in each record with a space.

gsub (/^.*-/,"",$1) # Removes all the characters upto - in the column1.

printf "%-40s %d\n", $2,$1  # Prints column2, column1 in a formatted way.