Question

我有一个文件，其字段由＆＃34;;＆＃34;分隔，如下所示：

test;group;10.10.10.10;action2
test2;group;10.10.13.11;action1
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test5;group2;10.10.10.12;action5
test6;group4;10.10.13.11;action8

我想识别所有非唯一的IP地址（第3列）。通过示例，提取应该是：

test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8

按IP地址排序（第3列）。

简单的命令，如cat，uniq，sort，awk（不是Perl，不是Python，只有shell）。

有什么想法吗？

Answer 1

$ awk -F';' 'NR==FNR{a[$3]++;next}a[$3]>1' file file|sort -t";" -k3
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8

awk挑选所有重复的（$ 3）行
按IP分类排序

Answer 2

您还可以使用grep，cut，sort，uniq以及中间的临时流程替换来尝试此解决方案。

grep -f <(cut -d ';' -f3 file | sort | uniq -d) file | sort -t ';' -k3

这不是很优雅（我实际上更喜欢上面给出的awk答案），但我认为值得分享，因为它可以实现你想要的。

Answer 3

这与肯特的答案非常相似，但只需一次通过该文件。权衡是记忆：你需要存储要保留的线。这使用GNU awk作为PROCINFO变量。

awk -F';' '
    {count[$3]++; lines[$3] = lines[$3] $0 ORS} 
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for (key in count) 
            if (count[key] > 1) 
                printf "%s", lines[key]
    }
' file

等效的perl

perl -F';' -lane '
    $count{$F[2]}++; push @{$lines{$F[2]}}, $_
  } END {
    print join $/, @{$lines{$_}}
        for sort grep {$count{$_} > 1} keys %count
' file

Answer 4

这是另一个awk辅助管道

$ awk -F';' '{print $0 "\t" $3}' file | sort -sk2 | uniq -Df1 | cut -f1

test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8

单通，所以特别缓存;也保持原始顺序（稳定排序）。 “假设”选项卡不会出现在字段中。

Answer 5

awk + sort + uniq + cut：

$ awk -F ';' '{print $0,$3}' <file> | sort -k2 | uniq -D -f1 | cut -d' ' -f1

sort + awk

$ sort -t';' -k3,3 | awk -F ';' '($3==k){c++;b=b"\n"$0}($3!=k){if (c>1) print b;c=1;k=$3;b=$0}END{if(c>1)print b}

<强> awk

$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
      END{for (i in k) if(k[i]>1) for(j=1;j<=k[i];j++) print b[i"_"j] } <file>

这会缓冲整个文件（与sort相同）并跟踪密钥k出现的次数。最后，如果键出现的次数多于1，则打印全套。

test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4

如果你想要它排序：

$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
      END{ asorti(k,l); 
      for (i in l) if(k[l[i]]>1) for(j=1;j<=k[l[i]];j++) print b[l[i]"_"j] } <file>

uniq排序解析

5 个答案: