用sed条件替换新行字符

时间:2013-09-11 19:20:01

标签: regex bash sed

我有一个损坏的文本文件,如果下一行(如果存在)没有以特定模式\x20*[\n\r]+开头,我需要将\xa0替换为DATA\t。如果这样的行以空格\x20+开头,那么也应该删除它们。

我可以使用sed吗?文本文件大小约为1MB。


数据示例:

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is 
me", 4322, "My name is Frank"
DATA     24121, "Where
   are
you", 52432, "I am

here"
DATA     43242, "End of story", 432432, "The end"

=>

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is me", 4322, "My name is Frank"
DATA     24121, "Where are you", 52432, "I am here"
DATA     43242, "End of story", 432432, "The end"

3 个答案:

答案 0 :(得分:1)

在Ruby中实现它的方法:

ruby -e 'puts File.read(ARGV.shift).gsub(/ *\r?\n *(?!DATA[[:space:]])/, " ").gsub(/ +$/m, "")' file

输出:

DATA    132942, "I love you", 2398, "Hi how are you"
DATA    78793, "It is me", 4322, "My name is Frank"
DATA    24121, "Where are you", 52432, "I am here"
DATA    43242, "End of story", 432432, "The end"

答案 1 :(得分:1)

cat input.txt | sed '{:q;N;s/\x20*[\n\r]\+/\xa0/g;t q}' | sed 's/\xa0DATA/\nDATA/g'

答案 2 :(得分:1)

这可能适合你(GNU sed):

sed ':a;$!N;/\nDATA/!s/\s*\n\s*/ /;ta;P;D' file