如何从Perl中删除数组中的重复项?

时间:2008-08-11 10:04:32

标签: perl arrays unique duplicates

我在Perl中有一个数组:

my @my_array = ("one","two","three","two","three");

如何从阵列中删除重复项?

11 个答案:

答案 0 :(得分:155)

您可以执行perlfaq4

中所示的此类操作
sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

my @array = qw(one two three two three);
my @filtered = uniq(@array);

print "@filtered\n";

输出:

one two three

如果您想使用模块,请尝试List::MoreUtils

中的uniq功能

答案 1 :(得分:118)

Perl文档附带了很多常见问题解答。您的问题经常被问到:

% perldoc -q duplicate

上面命令输出的答案,复制和粘贴,如下所示:

Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
 How can I remove duplicate elements from a list or array?
   (contributed by brian d foy)

   Use a hash. When you think the words "unique" or "duplicated", think
   "hash keys".

   If you don't care about the order of the elements, you could just
   create the hash then extract the keys. It's not important how you
   create that hash: just that you use "keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

   If you want to use a module, try the "uniq" function from
   "List::MoreUtils". In list context it returns the unique elements,
   preserving their order in the list. In scalar context, it returns the
   number of unique elements.

       use List::MoreUtils qw(uniq);

       my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
       my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

   You can also go through each element and skip the ones you've seen
   before. Use a hash to keep track. The first time the loop sees an
   element, that element has no key in %Seen. The "next" statement creates
   the key and immediately uses its value, which is "undef", so the loop
   continues to the "push" and increments the value for that key. The next
   time the loop sees that same element, its key exists in the hash and
   the value for that key is true (since it's not 0 or "undef"), so the
   next skips that iteration and the loop goes to the next element.

       my @unique = ();
       my %seen   = ();

       foreach my $elem ( @array )
       {
         next if $seen{ $elem }++;
         push @unique, $elem;
       }

   You can write this more briefly using a grep, which does the same
   thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;

答案 2 :(得分:66)

从CPAN安装List::MoreUtils

然后在你的代码中:

use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @dup_list = qw(1 1 1 2 3 4 4);

my @uniq_list = uniq(@dup_list);

答案 3 :(得分:22)

我通常的做法是:

my %unique = ();
foreach my $item (@myarray)
{
    $unique{$item} ++;
}
my @myuniquearray = keys %unique;

如果您使用哈希并将项添加到哈希。您还可以了解每个项目在列表中出现的次数。

答案 4 :(得分:7)

变量@array是具有重复元素的列表

%seen=();
@unique = grep { ! $seen{$_} ++ } @array;

答案 5 :(得分:7)

可以使用简单的Perl one liner完成。

my @in=qw(1 3 4  6 2 4  3 2 6  3 2 3 4 4 3 2 5 5 32 3); #Sample data 
my @out=keys %{{ map{$_=>1}@in}}; # Perform PFM
print join ' ', sort{$a<=>$b} @out;# Print data back out sorted and in order.

PFM块执行此操作:

@in中的数据被送入MAP。 MAP构建匿名哈希。密钥从哈希中提取并输入@out

答案 6 :(得分:4)

最后一个非常好。我只是稍微调整一下:

my @arr;
my @uniqarr;

foreach my $var ( @arr ){
  if ( ! grep( /$var/, @uniqarr ) ){
     push( @uniqarr, $var );
  }
}

我认为这可能是最易读的方式。

答案 7 :(得分:4)

方法1:使用哈希

逻辑:散列只能有唯一键,因此迭代数组,为数组的每个元素赋值,保持元素作为该散列的键。返回哈希的键,它是你唯一的数组。

my @unique = keys {map {$_ => 1} @array};

方法2:方法1的可重用性扩展

如果我们应该在代码中多次使用此功能,那么最好制作子程序。

sub get_unique {
    my %seen;
    grep !$seen{$_}++, @_;
}
my @unique = get_unique(@array);

方法3:使用模块List::MoreUtils

use List::MoreUtils qw(uniq);
my @unique = uniq(@array);

答案 8 :(得分:1)

先前的答案几乎总结了完成此任务的可能方法。

但是,我建议对那些关心计数重复项,但 do 关心顺序的人进行修改。

my @record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, @record;

请注意,先前建议的grep !$seen{$_}++ ...在否定之前会增加$seen{$_},因此无论是否已经%seen,都会发生该增加。但是,以上内容会在$record{$_}为真时发生短路,从而使人们一旦离开%record就听到了声音。

您还可以选择这种可笑性,它利用了自动生存性和哈希键的存在:

...
grep !(exists $record{$_} || undef $record{$_}), @record;

但是,这可能会导致一些混乱。

如果您不在乎顺序或重复计数,则可以使用哈希片和我刚才提到的技巧进行另一次黑客入侵:

...
undef @record{@record};
keys %record; # your record, now probably scrambled but at least deduped

答案 9 :(得分:0)

试试这个,似乎uniq函数需要一个排序列表才能正常工作。

@Html.EnumDropDownListFor

答案 10 :(得分:0)

使用唯一哈希键的概念:

my @array  = ("a","b","c","b","a","d","c","a","d");
my %hash   = map { $_ => 1 } @array;
my @unique = keys %hash;
print "@unique","\n";

输出: a c b d