Question

假设我有一个函数从postgres数据库返回1000条记录，作为一个看起来像这样的（但更大）的dicts列表：

[ {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
  {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}]

我有一个流程，根据给定的唯一thing_id，在此列表中需要大约600次单独搜索才能获得正确的dict。不是每次遍历整个列表，而是创建一个dicts的dict不是更有效，使每个dict的thing_id成为一个键，如下所示：

{245 : {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
 459 : {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}}

如果是这样，有没有一种首选的方法呢？显然，我可以通过遍历列表来构建字典。但是想知道是否有任何内置的方法。如果没有，那么最好的方法是什么？另外，是否有更好的方法可以从同一大组记录中重复检索数据，而不是我在这里提出的，请告诉我。

更新：结束了dict理解：

data = {row["thing_id"]: row for row in rows}

其中rows是我的db查询与psycopg2.extras.DictCursor的结果。构建dict足够快，查找速度非常快。

Answer 1

您可以将pandas DataFrame结构用于多列索引：

>>> result = [
        {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"},
        {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}
    ]
>>> df = pd.DataFrame(result)
>>> df.set_index('thing_id', inplace=True)
>>> df.sort_index(inplace=True)
>>> df
             thing_title    thing_url
thing_id                             
245          Thing title    thing-url
459       Thing title II  thing-url/2
>>> df.loc[459, 'thing_title']
'Thing title II'

Answer 2

a = [ {"thing_id" : 245, "thing_title" : "Thing title", "thing_url": "thing-url"}, {"thing_id" : 459, "thing_title" : "Thing title II", "thing_url": "thing-url/2"}]
c = [b.values()[1] for b in a]

在大量dicts上重复搜索的最佳方法

2 个答案: