如何在linux中删除文件中的特定重复字符串

时间:2013-09-13 19:52:19

标签: linux sed awk uniq

我有一个列表,其数据与IP地址配对,我只想看一次IP地址,我不想更改订单。

192.168.0.100    fred is happy
192.168.0.100    fred likes pie
192.168.0.100    pie is good
192.168.0.110    tom like cake
192.168.0.110    cake is good
192.168.0.110    pie is better
192.168.0.112    bill like lettuce
192.168.0.112    lettuce is good for you
192.168.0.112    cake and pie are better tasting than lettuce

我想要做的只是删除重复的IP地址,但保留一切完全相同。

我想让它看起来像这样

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

我不想触及任何重复的单词,我无法更改顺序

谢谢你能提供帮助

5 个答案:

答案 0 :(得分:2)

无论文件中有哪种间距和/或RE元字符,这都可以工作:

$ awk '
{ key = $1 }
key == prev { sub(/[^[:space:]]+/,sprintf("%*s",length(key),"")) }
{ prev = key; print }
' file
192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

请注意在RE上下文中使用$ 1的解决方案,因为IP地址中的“。”是RE元字符,表示“任何字符”,因此它们可能适用于某些示例数据,但您可以在给定其他输入的情况下获得错误匹配。

答案 1 :(得分:1)

我猜ip和文本之间的分隔符是tab,然后这个单行应该适合你:

awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' file

使用您的文件进行测试:

kent$  awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' f
192.168.0.100   fred is happy
                fred likes pie
                pie is good
192.168.0.110   tom like cake
                cake is good
                pie is better
192.168.0.112   bill like lettuce
                lettuce is good for you
                cake and pie are better tasting than lettuce

答案 2 :(得分:1)

使用awk:

awk 'BEGIN{FS=OFS="    "}{t=$1;if(t in a){gsub(/./," ",$1);a[t]=a[t]RS$0}else{a[t]=$0}}END{for(i in a)print a[i]}' file

输出:

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

答案 3 :(得分:1)

还有一个:

awk 'A[$1]++{s=$1; gsub(/./,FS,s); sub($1,s)}1' file

答案 4 :(得分:0)

这可能适合你(GNU sed):

sed -r '1{:a;p;h;s/\s.*//;s/./ /g;H;d};G;s/^(\S+)(\s.*)\n\1.*\n(.*)/\3\2/;t;s/\n.*//;ba' file

打印第一条记录和密钥更改的记录,并将密钥及其补码存储在保留空间中的空格中。对于后续记录,将存储的密钥与当前密钥进行比较,对于匹配的密钥,将当前密钥替换为空格的补码。对于那些不匹配的键,删除存储的键和补码,并从头开始重复。

相关问题