Question

我有一个近2k字典的列表。而且我多次使用该列表。例如：

c = myClass()
c.create(source) # where source is a text of approximately 50k chars
                 # this method creates the list that has approximately 2k dictionaries
item = c.get(15012) # now, this one loops thru the list to find an item
                    # whenever the condition is matched, the for loop is broken and the value is returned
item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

现在，想象一下这种情况，我一遍又一遍地使用相同的列表。由于列表很大，我想使用生成器，但据我所知，生成器必须在抛出StopIteration时重新创建。所以基本上，在这种情况下，使用发电机是否方便？还是在速度方面有更有效的方法？

Answer 1

听起来我觉得你必须决定做什么：

1）保存这些值，这样您就不必重新计算它们，而是使用更多空间来完成这些操作。

2）每次重新计算它们，但节省空间，因为你不必存储它们。

如果你考虑一下，无论你使用什么样的发生器/列表/什么，这两件事中的一件必须发生。而且我认为没有一个简单的硬规则可以说哪个更好。（就个人而言，我会选择一个，不要回头。你的一生都在你面前。）

Answer 2

如果经常从先前检索的项目获得已知偏移的项目，则更改.get不仅返回项目，而且还返回列表中的位置。然后，您可以将prevItem实现为：

def previtem(self, pos):
    return self.itemlist[pos - 1]

item, pos = c.get(itemnum)
item2 = c.prevItem(pos)

相反，如果您在item上进行某种操作以获得新的itemnum，则应将其存储在dict而不是list中。这样，get只是一个字典查找（比列表搜索快得多）：

def get(self, itemnum):
    return self.premade_dict[itemnum]

因此，您应该能够以更便宜的操作替换某些搜索。

Answer 3

取决于您希望如何使用发电机。生成器擅长仅在真正需要时执行代码。看来你的for循环已经打破了。

你可以改变你的班级界面。

def getItems(cond):
    # find item, remember index
    yield item
    # find previous item, possibly much more efficient with the index
    yield previtem

现在，在调用getItems（）时，您可以将返回的生成器移动1或2个项目，并且只执行所需的代码。

Answer 4

两千个词典的列表很正常。我想，一个典型的网站管理员有很多这样的列表。如果你很少处理这样的问题，你可能会对一个临时解决方案感到满意 - 也许值得考虑字典词典，这样你就不必每次都遍历每个键。但是，从我收集的数据结构中，更常规的方法是使用数据库。你的每个词典都可以有一些键（理想情况下你在循环中检查的条件）。可以指示数据库通过此键索引数据，如果你查看它所做的工作来检索你想要的字典，你可能会惊讶地发现答案几乎没有 - 它几乎只是切换到了你要求的卡，可以这么说（虽然它必须做一些工作来设置索引，这类似于排序操作）。

Python提供了许多将代码映射到各种数据库的好方法。查看功能强大但复杂的sqlalchemy，内置的std库sqlite3模块，或者和我一起试验mongoengine和nosql数据库。（当然还有很多，但你可以在这里轻松找到另一篇文章，概述）。祝你好运。

Answer 5

您可以尝试OrderedDict的此子类。我之前提交的内容不正确（在底部提到）：

from collections import OrderedDict

class MyOrderedDict(OrderedDict):
    def index(self, key):
        if key not in self.keys():
            raise KeyError
        return list(d.keys()).index(key)
    def prev(self, key):
        idx = self.index(key) - 1
        if idx < 0:
            raise IndexError
        return list(d.keys())[idx]
    def next(self, key):
        _list = list(d.keys())
        idx = self.index(key)
        if idx > len(_list):
            raise IndexError
        return _list[idx+1]

# >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One')))
# >>> d.index(3)
# 0
# >>> d.index(2)
# 1
# >>> d.prev(2)
# 3
# >>> d.prev(3)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 9, in prev
# IndexError
# >>> d.next(4)
# 1
# >>> d.next(1)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 16, in next
# IndexError: list index out of range

修改 - 如下所示@agf，这是不正确的。

您正在寻找一种从myClass检索项目的快捷方式，因此您应该使用字典。但与此同时，您希望数据具有某种顺序，以便您可以对其执行prevItem。为什么不将数据存储在Python 2.7,3.1中添加的collections.OrderedDict中。 ref

Answer 6

您应该使用列表，因为您可以使用它进行一项简单的优化：按您要查找的属性（在.get中）对其进行排序并进行二分查找。

在2000个项目的列表中，平均比较次数从1000减少到10！获得上一个（和下一个）项目也变得微不足道。

有关二分算法，请参阅the bisect module。

我应该在这种情况下使用生成器吗？

6 个答案: