Question

在perl中，我有一系列哈希像

0  HASH(0x98335e0)
   'title' => 1177
   'author' => 'ABC'
   'quantity' => '-100'


1  HASH(0x832a9f0)
   'title' => 1177
   'author' => 'ABC'
   'quantity' => '100'

2  HASH(0x98335e0)
   'title' => 1127
   'author' => 'DEF'
   'quantity' => '5100'


3  HASH(0x832a9f0)
   'title' => 1277
   'author' => 'XYZ'
   'quantity' => '1030'

现在我需要累积标题和作者相同的数量。在上面的hash结构中，title = 1177和author ='ABC'的数量可以累加到一个，整个结构应该如下所示

0  HASH(0x98335e0)
   'title' => 1177
   'author' => 'ABC'
   'quantity' => 0

1  HASH(0x98335e0)
   'title' => 1127
   'author' => 'DEF'
   'quantity' => '5100'

2  HASH(0x832a9f0)
   'title' => 1277
   'author' => 'XYZ'
   'quantity' => '1030'

我可以做这种积累的最佳方法是什么，以便进行优化？数组元素的数量可能非常大。我不介意在哈希中添加额外的密钥以帮助相同，但我不想要n次查找。请建议

Answer 1

my %sum;
for (@a) {
  $sum{ $_->{author} }{ $_->{title} } += $_->{quantity};
}

my @accumulated;
foreach my $author (keys %sum) {
  foreach my $title (keys %{ $sum{$author} }) {
    push @accumulated => { title    => $title,
                           author   => $author,
                           quantity => $sum{$author}{$title},
                         };
  }
}

不确定map是否让它看起来更漂亮：

my @accumulated =
  map {
    my $author = $_;
    map { author   => $author,
          title    => $_,
          quantity => $sum{$author}{$_},
        },
      keys %{ $sum{$author} };
  }
  keys %sum;

Answer 2

如果您不想要N次查找，那么您需要 hash 函数 - 但是您需要使用该哈希函数存储它们。当你将它们放在列表（或数组）中时，为时已晚。您要么幸运，一直，要么您将进行N次查找。

或将插入中的>。混合解决方案是将定位器作为项目0存储在列表/数组中。

my $lot = get_lot_from_whatever();
my $tot = $list[0]{ $lot->{author} }{ $lot->{title} };
if ( $tot ) { 
    $tot->{quantity} += $lot->{quantity};
}
else { 
    push @list, $list[0]{ $lot->{author} }{ $lot->{title} } = $lot;
}

前

首先，我们将重新格式化以使其可读。

[ { title => 1177, author => 'ABC', quantity => '-100' }
, { title => 1177, author => 'ABC', quantity => '100'  }
, { title => 1127, author => 'DEF', quantity => '5100' }
, { title => 1277, author => 'XYZ', quantity => '1030' }
]

接下来，您需要解决问题。你想要数量的东西分组作者和标题。因此，您需要唯一识别那些批次。要重复，您需要 names 的组合来识别实体。因此，你将需要一个通过名称标识事物的哈希。

由于我们有两件事，双重哈希是一种很好的方法。

my %hash;
foreach my $lot ( @list ) {
    $hash{ $lot->{author} }{ $lot->{title} } += $lot->{quantity};
}
# consolidated by hash

要将其转回列表，我们需要拆分水平。

my @consol
    = sort { $a->{author} cmp $b->{author} || $a->{title} cmp $b->{title} }
      map  { 
          my ( $a, $titles ) = @$_; # $_ is [ $a, {...} ]
          map { +{ title => $_, author => $a, quantity => $titles->{$_} }
          keys %$titles;
      } 
      map  { [ $_ => $hash{$_} ] } # group and freeze a pair
      keys %hash
    ;

# consolidated in a list.

你有它回来了，我甚至为你排序。当然你也可以按此排序 - 出版商就是这样 - 减少数量。

sort {  $b->{quantity} <=> $a->{quantity} 
     || $a->{author}   cmp $b->{author} 
     || $a->{title}    cmp $b->{title} 
     }

Answer 3

我认为退一步考虑数据来源非常重要。如果数据来自数据库，那么您应该编写SQL查询，以便为每个作者/标题组合提供一行，并在数量字段中为总数量。如果您正在从文件中读取数据，那么您应该直接将其读入哈希值，或者如果订单很重要，则使用Tie::IxHash。

一旦你在hashrefs数组中获得了数据，就必须创建一个辅助数据结构并进行一大堆查找，其成本可能会主导程序的运行时间（不是如果它每天运行15分钟就会很重要，你可能会遇到内存问题。

哈希数组

3 个答案: