计算哈希中的多个字段

时间:2014-05-13 22:28:57

标签: ruby enumerable

问题: 我需要提取某些键并将它们计算在哈希中,作为示例考虑:

data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"}, 
        {"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
        {"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
        {"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
        {"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]

我想计算每个[id(从名称中提取)和优先级]的记录数,以便最后我会得到类似的内容:

#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}

我正在做以下事情,但我觉得我过于复杂了:

#!/usr/bin/env ruby

data = [{"name"=>"name1", "priority"=>"1", "owner"=>"test3"}, 
       {"name"=>"name1", "priority"=>"1", "owner"=>"test4"},
       {"name"=>"name2", "priority"=>"1", "owner"=>"test5"},
       {"name"=>"name2", "priority"=>"2", "owner"=>"test5"},
       {"name"=>"nae954me2", "priority"=>"2", "owner"=>"test5"}]

# (1) trash some keys, just because I don't need them  
data.each do |d|
  d.delete 'owner'
  # in the real data I have about 4 or 5 that I'm trashing
  d['id'] = d['name'].scan(/[a-z][a-z][a-z][a-z][0-9]/)[0] # only valid ids
  d.delete 'name'
end

puts data
#output: 
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}
#{"priority"=>"2", "id"=>nil}

# (2) reject invalid keys
data = data.reject { |d| d['id'].nil? }

puts data
#output: 
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name1"}
#{"priority"=>"1", "id"=>"name2"}
#{"priority"=>"2", "id"=>"name2"}

# (3) count
counts = Hash.new(0)
data.each do |d|
  counts[d] += 1
end

puts counts
#{{"priority"=>"1", "id"=>"name1"}=>2, {"priority"=>"1", "id"=>"name2"}=>1, {"priority"=>"2", "id"=>"name2"}=>1}

有关改进我的计算方法的任何建议?

2 个答案:

答案 0 :(得分:1)

有很多方法可以做到这一点。 (您可能已经注意到我已经对我的答案进行了大量编辑,详细解释了一种方法是如何工作的,只是意识到有一种更好的方法可以做到这一点,所以出现了大砍刀。)这里有两个解决方案。第一个是受到你采用的方法的启发,但我试图将它打包成更像Ruby的方法。我不确定什么是有效的“名称”,所以我把这个决定放在一个可以轻易改变的单独方法中。

<强>代码

def name_valid?(name)
  name[0..3] == "name"
end

data.each_with_object(Hash.new(0)) {|h,g|
  (g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1) if name_valid?(h["name"])}
  #=> {{"id"=>"name1", "priority"=>"1"}=>2,
  #    {"id"=>"name2", "priority"=>"1"}=>1,
  #    {"id"=>"name2", "priority"=>"2"}=>1}

<强>解释

Enumerable#each_with_object创建一个初始为空的哈希,其默认值为零,由块变量g表示。 g是通过添加从data

的元素创建的哈希元素构建的
g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1

如果哈希g具有密钥

{"id"=>h["name"],"priority"=>h["priority"]}

与键相关联的值增加1。如果h没有此密钥,

g[{"id"=>h["name"],"priority"=>h["priority"]}]

之前

设置为零

g[{"id"=>h["name"],"priority"=>h["priority"]}]+=1
调用

,因此值变为1

替代方法

<强>代码

data.each_with_object({}) do |h,g|
  hash = { { "id"=>h["name"], "priority"=>h["priority"] } => 1 } 
  g.update(hash) { |k, vg, _| vg + 1 } if name_valid?(h["name"])
end
  #=> {{"id"=>"name1", "priority"=>"1"}=>2,
  #    {"id"=>"name2", "priority"=>"1"}=>1,
  #    {"id"=>"name2", "priority"=>"2"}=>1}

<强>解释

在这里,我使用Hash#update(又名Hash#merge!)将data(哈希)的每个元素合并到最初为空的哈希h中(前提是"name"的值有效)。 update的阻止

{ |k, vg, _| vg + 1 }
当且仅当合并的散列(g)和合并散列(hash)具有相同的密钥k时才会调用

,在这种情况下,块返回值钥匙。请注意,第三个块变量是散列k的键hash的值。由于我们不使用该值,因此我将其替换为占位符_

答案 1 :(得分:1)

根据你所说的“类似的东西”,这可能就是诀窍:

data.group_by { |h| [h["name"], h["priority"]] }.map { |k, v| { k => v.size } }

=> [{["name1", "1"]=>2}, {["name2", "1"]=>1}, {["name2", "2"]=>1}, {["nae954me2", "2"]=>1}]