可订阅的延迟映射

时间:2014-04-03 17:09:05

标签: python python-3.x mapping lazy-evaluation

在python中,我知道两个懒惰的“容器”:生成器和<class 'map'>

两者都不可订阅。因此map(f, data)[1](f(x) for x in data)[1]会失败。

python中是否有支持下标的延迟映射类?

如果没有最接近的匹配?

我一直在搜索functools无效(或者我没有发现它)。

基本上我正在寻找这样的东西(但重新发明轮子应该是最后一个选择):

class lazilymappedlist:
    def __init__ (self, f, lst):
        self.f = f
        self.data = [ (False, e) for e in lst]

    def __getitem__ (self, idx):
        status, elem = self.data [idx]
        if status: return elem
        elem = self.f (elem)
        self.data [idx] = (True, elem)
        return elem

2 个答案:

答案 0 :(得分:1)

由于各种原因,这是一个棘手的事情。它可以很少的努力实现,假设输入数据,(时间)映射和访问模式的复杂性没有任何内容,但是随后首先使用生成器的一个或多个优点消失了。在问题中给出的示例代码中,不必跟踪所有值的优点就会丢失。

如果我们允许纯随机访问模式,那么至少必须缓存所有映射值,再次失去生成器的内存优势。

根据两个假设

  • 映射函数很昂贵,只是稀疏地称为
  • 访问模式是随机的,因此必须缓存所有值

问题中的示例代码应该没问题。这里有一些稍微不同的代码,它们也可以将生成器作为传入数据处理。它的优点是不会在对象构造上详尽地建立传入数据的副本。

因此,如果传入数据具有__getitem__方法(索引),则使用self.elements dict实现精简缓存包装器。如果访问稀疏,则字典比列表更有效。如果传入的数据没有索引,我们只需使用和存储待处理的传入数据以供以后映射。

代码:

class LazilyMapped:
    def __init__(self, fn, iterable):
        """LazilyMapped lazily evaluates a mapping/transformation function on incoming values

        Assumes mapping is expensive, and [idx] access is random and possibly sparse.
        Still, this may defeat having a memory efficient generator on the incoming data,
        because we must remember all read data for potential future access.

        Lots of different optimizations could be done if there's more information on the
        access pattern. For example, memory could be saved if we knew that the access idx
        is monotonic increasing (then a list storage would be more efficient, because
        forgetting data is then trivial), or if we'd knew that any index is only accessed
        once (we could get rid of the infite cache), or if the access is not sparse, but
        random we should be using a list instead of a dict for self.elements.

        fn is a filter function, getting one element of iterable and returning a bool,
        iterable may be a generator
        """
        self.fn = fn
        self.sparse_in = hasattr(iterable, '__getitem__')
        if self.sparse_in:
            self.original_values = iterable
        else:
            self.iter = iter(iterable)
            self.original_idx = 0      # keep track of which index to do next in incoming data
            self.original_values = []  # keep track of all incoming values
        self.elements = {}      # forever remember mapped data
    def proceed_to(self, idx):
        """Consume incoming values and store for later mapping"""
        if idx >= self.original_idx:
            for _ in range(self.original_idx, idx + 1):
                self.original_values.append(next(self.iter))
            self.original_idx = idx + 1
    def __getitem__(self, idx):
        if idx not in self.elements:
            if not self.sparse_in:
                self.proceed_to(idx)
            self.elements[idx] = mapped = self.fn(self.original_values[idx])
        else:
            mapped = self.elements[idx]
        return mapped

if __name__ == '__main__':
    test_list = [1,2,3,4,5]
    dut = LazilyMapped(lambda v: v**2, test_list)
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4

    dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4

答案 1 :(得分:0)

我不知道标准python库中的任何这样的容器,因此我自己在library中实现它,模仿itertools的可索引对象:

import seqtools

def do(x):
    print("-> computing now")
    return x + 2

a = [1, 2, 3, 4]
m = seqtools.smap(do, a)
m = seqtools.add_cache(m, len(m))
# nothing printed because evaluation is delayed
m[0]
  

- &GT;现在计算

     

3