用文本文件中的单个单词替换所有数字

时间:2015-12-08 02:21:31

标签: bash

是否有简单有效的单行解决方案来替换包含数字和符号的所有数字或序列(\ / $& *#@)( - +!〜。,:;"&#39 ;`^%_] [{} =),例如:

1 2 3 4 998898321321
0.2 1.2 32221.111. 1321321321.111
111.11212.21212
212323/12331/321312
121-12123-32131
121+12123+32131
1_212121_2320
12131!~~~323131
etc

在大文本(100GB)文件中使用单个标记NUMBER?样本输入和输出:

输入:

hello my friend 212323/12331/321312
hope you are fine 12131!~~~323131 in 33-years from now
happy face is important to maintaion by 98987 321321/32131

输出:

hello my friend NUMBER
hope you are fine NUMBER in 33-years from now
happy face is important to maintaion by NUMBER NUMBER

基本上,包含数字和非字母符号的两个空格之间的任何内容都必须由NUMBER替换。文本的其余部分应保持原样。

2 个答案:

答案 0 :(得分:2)

好的,我想我得到了这个:

我需要三个步骤:

  1. 加倍空白
  2. 将空格或换行符包围的所有非字母字符替换为NUMBER(同时保留空格)
  3. 将双白空间折叠成单个
  4. 现在的样子:

    $ cat test.txt
    hello my friend 212323/12331/321312
    hope you are fine 12131!~~~323131 in 33-years from now
    happy face is important to maintaion by 98987 321321/32131
    123 This is a line
    
    $ sed -r 's/ /  /g;s/(^| )[^[:alpha:] ]+( |$)/\1NUMBER\2/g;s/  / /g' test.txt
    hello my friend NUMBER
    hope you are fine NUMBER in 33-years from now
    happy face is important to maintaion by NUMBER NUMBER
    NUMBER This is a line
    

答案 1 :(得分:0)

使用perl解决方案来补充chw21's helpful solution,该解决方案不仅可以处理空格,还可以任意混合空格和单词之间的标签

perl -ple 's/(^|(?<=[[:blank:]]))[^[:alpha:][:blank:]]+((?=[[:blank:]])|$)/NUMBER/g' file

使用look-behind((?<=...)和前瞻((?=...))断言消除了对捕获组的需求,因此需要将空间加倍作为中间步骤;使用[[:blank:]](空格或制表符)代替(只是空格),可以使用任何空格和制表符组合:

  • (^|(?<=[[:blank:]]))匹配行的开头(^)或任何以空格(空格或制表符)开头的字符

  • [^[:alpha:][:blank:]]+匹配由非字母和非空格组成的任何非空字符

  • ((?=[[:blank:]])|$)在该行末尾($)匹配,或者以下字符为空白。

相关问题