Question

我研究过生成器功能，我想我得到了它，但我想了解我可以在我的代码中应用它的位置。

我想到了以下我在“Python必备参考”一书中读到的例子：

# tail -f
 def tail(f):
  f.seek(0,2) 
  while True:
   line = f.readline() 
   if not line: 
     time.sleep(0.1)
     continue
   yield line

你有没有其他有效的例子，其中生成器是最好的工具，如tail -f？

您经常使用生成器功能以及通常应用哪种功能\部分程序？

Answer 1

当我实现扫描程序（tokenizer）或迭代数据容器时，我经常使用它们。

编辑：这是我用于C ++语法高亮程序的演示标记器：

whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'

def scan(s):
    "returns a token and a state/token id"
    words = {0:'', 1:'', 2:''} # normal, operator, whitespace
    state = 2 # I pick ws as first state
    for c in s:
        if c in operators:
            if state != 1:
                yield (words[state], state)
                words[state] = ''
            state = 1
            words[state] += c
        elif c in whitespace:
            if state != 2:
                yield (words[state], state)
                words[state] = ''
            state = 2
            words[state] += c
        else:
            if state != 0:
                yield (words[state], state)
                words[state] = ''
            state = 0
            words[state] += c
    yield (words[state], state)

用法示例：

>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>>

Answer 2

每当您的代码生成无限数量的值时，或者更常见的情况是，如果首先生成整个列表会消耗太多内存。

或者，如果您不可能会遍历整个生成列表（并且列表非常大）。我的意思是，如果不使用它，那么首先生成每个值（并等待生成）是没有意义的。

我最近遇到的生成器就是我实现了一个线性递归序列（LRS），例如： Fibonacci序列。

Answer 3

在我拥有读取任何内容的算法的所有情况下，我都只使用生成器。

为什么呢？

在多个生成器的上下文中，分层过滤，映射和缩减规则非常容易。

示例：

def discard_blank( source ):
    for line in source:
        if len(line) == 0:
            continue
        yield line

def clean_end( source ):
    for line in source:
        yield line.rstrip()

def split_fields( source ):
    for line in source;
        yield line.split()

def convert_pos( tuple_source, position ):
    for line in tuple_source:
        yield line[:position]+int(line[position])+line[position+1:]

with open('somefile','r') as source:
    data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
    total= 0
    for l in data:
        print l
        total += l[0]
    print total

我的偏好是使用许多小型发电机，以便小的改变不会对整个过程链造成破坏。

Answer 4

通常，将数据采集（可能很复杂）与消费分开。特别是：

连接几个b树查询的结果 - 数据库部分生成并执行查询yield - 来自每个查询的记录，消费者只看到到达的单个数据项。
缓冲（预读） - 生成器以块的形式获取数据并从每个块中生成单个元素。同样，消费者与血腥细节分开。

生成器也可以作为协程运行。您可以使用“消费者”端的nextval=g.next(data)和生成器端的data = yield(nextval)将数据传递到。在这种情况下，生成器及其消费者'交换'值。您甚至可以使yield在生成器上下文中抛出异常：g.throw(exc)执行此操作。

你在python代码中在哪里使用生成器功能？

4 个答案: