Question

我正在解析已放入列表列表中的大量二进制数据：

row = [1,2,3...]                # list of many numbers
data = [row1,row2,row3...]      # a list of many rows

list_of_indices = [1,5,13,7...] # random list of indices. Always shorter than row
                                #This list won't change after creation

我想返回一行只包含list_of_indices中列出的元素：

subset_row = [row(index) for index in list_of_indices]

我的问题：

subset_row是否包含返回的每个元素的副本（即subset_row将是内存中的全新列表）或subset_row是否包含对原始数据的引用。请注意，数据不会被修改，所以我认为它可能不重要..

此外，有更有效的方法吗？我将不得不迭代数千行..

这里有一点涉及，但就返回的内容而言还不够具体。 What is the simplest and most efficient function to return a sublist based on an index list?

Answer 1

首先，它应该是

[row[index] for index in list_of_indexes]

（或只是map(list_of_indexes.__getitem__, row)）

其次，Python中没有办法获得对象的引用/指针;换句话说，无论如何，一切都已经是一个参考。那么这意味着，有效地，在int s的情况下，基本上没有区别;在更多“重量级”对象的情况下，您会自动获得引用，因为在Python中没有任何隐式复制。

注意：如果您row包含大量数据，list_of_indexes也是一个很长的列表，您可能需要考虑延迟评估（也就是生成器和生成器） Python中的表达式：

subset_row = (row[index] for index in list_of_indexes)

现在您可以迭代subset_row而无需评估/读取内存中序列中的所有值，或者您可以使用以下命令逐个使用序列：

first = next(subset_row)
second = next(subset_row)
# etc

Futhermore ，既然您在代码示例中也提到了“列表列表”并且有data = [row1, row2, ...]，我怀疑您可能希望同时在多个列表中应用该操作：

indices = [3, 7, 123, ...]
data = [<row1>, <row2>, ...]
rows = [[row[i] for i in indices] for row in data]

或外部列表的懒惰：

rows = ([row[i] for i in indices] for row in data)

或两者兼有：

row = ((row[i] for i in indices) for row in data)

返回列表子集的最有效方法是什么

1 个答案: