基于分隔符拆分文件,然后连接到单独的行

时间:2018-01-17 02:08:43

标签: bash awk sed

我有一个文件,example.txt

0
   A
   B
   C, C, C
   D, D
   E
   F
1
   A, A, A
   B
   C
2
   A
   B
   C
   D, D, D
   E

我需要根据任何数字分离文件,然后在这些数字之间取内容并将它们连接成一行,重复文件的每个部分的过程:

A, B, C, C, C, D, D, E, F
A, A, A, B, C
A, B, C, D, D, D, E

我提出的最好的是:

cat example.txt | sed -e '1,/^[0-9]/d' -e '/^[0-9]/,$d' | paste -sd "," -

A, A, A,   B,   C
在这种情况下,

只是中间部分。那,或将所有部分打印到一行。

5 个答案:

答案 0 :(得分:5)

较短的惯用awk替代方案:

$ awk '$1=$1{printf "%s%s",$0,(RT==","?OFS:ORS)}' RS="[0-9]|," OFS=", " file1
A, B, C, C, C, D, D, E, F
A, A, A, B, C
A, B, C, D, D, D, E

RS是记录分隔符。默认为新行,此处设置为数字或逗号
OFS是输出字段分隔符=具有单个空格的逗号
RT是正在使用的记录分隔符值
ORS是输出记录分隔符,默认为新行 $1=$1是一个惯用的赋值,强制awk根据OFS,ORS等的值重新计算字段和记录
(RT==","?OFS:ORS)如果基于synthax的操作是三元组 (condition?action for true:action for false)

答案 1 :(得分:2)

尝试:

$ awk 'function prn(line) {if(line){gsub(/[[:space:]]+/, " ", line); print line}}  /^[0-9]/{prn(line); line=""; next} {if(line)line=line"," $0; else line=$0} END{prn(line)}' example.txt
 A, B, C, C, C, D, D, E, F
 A, A, A, B, C
 A, B, C, D, D, D, E

或者,对于那些喜欢分布在多行的代码的人:

awk 'function prn(line)
      {
          if(line){
              gsub(/[[:space:]]+/, " ", line)
              print line
           }
       }

       /^[0-9]/{
           prn(line)
           line=""
           next
       }

       {
           if(line)
               line=line"," $0
           else
               line=$0
       }

       END{
           prn(line)
       }' example.txt

如何运作

  • function prn(line) {if(line){gsub(/[[:space:]]+/, " ", line); print line}}

    这定义了一个函数prn,它压缩多余的空格并打印出该行。

  • /^[0-9]/{prn(line); line=""; next}

    如果当前行以数字开头,请在prn的内容上调用line,将行重置为空字符串,然后跳过其余命令,转而转到{{ 1}}行。

  • next

    将当前行添加到变量{if(line)line=line"," $0; else line=$0}的末尾。

  • line

    在我们到达文件末尾后,请致电END{prn(line)}上的prn

答案 2 :(得分:2)

关注awk也可能有所帮助。

awk '/^[0-9]+/ && val{print val;val="";next} FNR>1{sub(/^ +/,"");val=val?val ", " $0:$0} END{print val}'  Input_file

说明: 此处也为上述命令添加说明,现在也是非单行表格。

awk '
/^[0-9]+/ && val{        ##Checking condition here if a line starts from digit(s) and variable named val is NOT NULL if it is TRUE then do following:
  print val;             ##printing the value of variable val here.
  val="";                ##Nullifying the variable val here.
  next                   ##next will skip all further coming statements.
}
FNR>1{                   ##Checking condition here if line number is greater than 1 then do following:
  sub(/^ +/,"");         ##Using sub utility of awk to substitute all starting space with NULL of the current line.
  val=val?val ", " $0:$0 ##creating variable named val and concatenating its own value with it each time it comes here.
}
END{                     ##This is awk programs end section here. Which starts once whole Input_file is being read.
  print val              ##Printing the variable named val value here.
}
'  Input_file            ##Mentioning the Input_file name here.

答案 3 :(得分:2)

<强> SED

 echo `sed 's:$:,:' example.txt` | sed -r 's:^:, :;s:,\s*[0-9]+,\s*:\n:g;s:^\s*::;s:,? *$::'

<强> perl的

 perl -p0777e 's:^:, :;s:\n\s*:, :g;s:,\s*[0-9]+,\s*:\n:g;s:^\s*::;s:,?\s*$:\n:' example.txt
  1. echo...perl -p0777... - 将整个文件视为一个长行(包含换行符(perl)或空格(echo))
  2. s:^:, : - 在开头添加一个额外的逗号
  3. s:\n:,:g - 用逗号替换所有换行符
  4. s:,\*s[0-9]+,\s*:\n:g - 用换行符替换所有被commans包围的数字

答案 4 :(得分:1)

另一个sed

sed -n '
N
:A
$bB
/\n[ ]*[0-9][0-9]*$/!{
N
bA
}
h
s/\n[^\n]*$//
:B
s/[^\n]*\n[ ]*//
s/\n[ ]*/, /g
p
$b
x
s/.*\n//
bA
' infile