正则表达式 - Sed

时间:2012-03-24 21:38:04

标签: regex sed

如何使用|替换四和五之后的逗号但不是那些跟随一个和两个的人吗?

\"One,Two, Three\" Four, Five, Six

sed s'/,/|/'g

我很感激答案可以应用于转义引号中的任何逗号,而不仅仅是这个示例。

另一个例子是:

Mr ,Joe,Lish,,\"Acme, Inc.\",\"9599 Park Avenue, Suite 301\",Manhattan,NY,10022,\"\"\"6 A MAILING LIST MMBR GENERAL\"\"\"

4 个答案:

答案 0 :(得分:1)

使用sed的一种方式:

script.sed的内容:

## Substitute '\"' with '\n'.
s/\\\"/\n/g

## If there is an odd number of '\"' or the string doesn't end with '\"' I 
## will append some at the end. There is no danger, but it will be used to
## avoid an infinite loop.
## 1.- Save content to 'hold space'.
## 2.- Remove all characters except '\n'.
## 3.- Remove one of them because next command will add another one.
## 4.- Put content in 'pattern space' to begin working with it.
## So, if in original string there were 3 '\"', now there will be 6. ¡Fine!
h
s/[^\n]//g
s/\n//
H
g

## Label 'a'.
:a

## Save content to 'hold space'.
h

## Remove from first '\n' until end of line.
s/\(\n\).*$/\1/

## Substitute all commas with pipes.
s/,/|/g

## Delete first newline.
s/\n//

## Append content to print as final output to 'hold space'.
H

## Recover rest of line from 'hold space'.
g

## Remove content modified just before.
s/[^\n]*//

## Save content to 'hold space'.
h

## Get first content between '\n'.
s/\(\n[^\n]*\n\).*$/\1/
s/\n\{2,\}//

## Susbtitute '\n' with original '\"'.
s/\n/\\"/g

## Append content to print as final output to 'hold space'.
H

## Recover rest of line from 'hold space'.
g

## Remove content printed just before.
s/\n[^\n]*\n//

/^\n/ { 
    s/\n//g
    p   
    b   
}

ba

infile的内容:

\"One,Two, Three\" Four, Five, Six 
One \"Two\", Three, Four, Five
One \"Two, Three, Four, Five\"
One \"Two\" Three, Four \"Five, Six\"

像以下一样运行:

sed -nf script.sed infile

具有以下结果:

\"One,Two, Three\" Four| Five| Six
One \"Two\"| Three| Four| Five
One \"Two, Three, Four, Five\"
One \"Two\" Three| Four \"Five, Six\"

答案 1 :(得分:1)

这可能对您有用:

 sed 's/^/\n/;:a;s/\n\("[^"]*"\|[^,]\)/\1\n/;ta;s/\n,/|\n/;ta;s/.$//' file

说明:

  • 在图案空间前面添加换行符。 s/^/\n/
  • 制作标签:a
  • 在引号之间的字符串或不是逗号的字符上移动换行符。 s/\n\("[^"]*"\|[^,]\)/\1\n/
  • 如果替换是标签的成功循环。 ta
  • \n,代替|\ns/\n,/|\n/
  • 如果替换是标签的成功循环。 ta
  • 如果没有替换发生,所有这样做都会删除换行符。 s/.$//

编辑:

实际上,可以使用任何唯一字符或字符组合代替\n

echo 'Mr ,Joe,Lish,,\"Acme, Inc.\",\"9599 Park Avenue, Suite 301\",Manhattan,NY,10022,\"\"\"6 A MAILING LIST MMBR GENERAL\"\"\"' | 
sed 's/^/@@@/;:a;s/@@@\("[^"]*"\|[^,]\)/\1@@@/;ta;s/@@@,/|@@@/;ta;s/@@@$//'
Mr |Joe|Lish||\"Acme, Inc.\"|\"9599 Park Avenue, Suite 301\"|Manhattan|NY|10022|\"\"\"6 A MAILING LIST MMBR GENERAL\"\"\"

答案 2 :(得分:0)

正则表达式有lookahead和lookbehind运算符。例如,Javascript调用

  

bodyText = bodyText.replace(/ Aa(?= A)/ g,'AaB');

如果后面跟着另一个“A”,则将“Aa”替换为“Aa”,留下“AaBA”。它与“AaB”不匹配,因为“Aa”后面没有另一个“A”。这是一个前瞻性的呼吁。

我相信lookbehind的语法是?< =。

因此,如果您正在使用的包支持这些运算符,那么您可以使用它们来匹配“,”前面的“四”或“五”,并且只替换“,”。

答案 3 :(得分:0)

我想出了这个:

 echo '\"One,Two, Three\" Four, Five, Six' | sed 's/\(\("[^"]*"\)\?[^",]\+\),/\1 |/g'

假设一行就像

  [ ["someting"] word, ]* ["someting"] word