交错文本文件内容的最Pythonic方式是什么?

时间:2012-03-28 23:55:29

标签: python

Python问题:

如果我有一个文件列表,那么如何从每个文件中打印#1行呢? 第2行等? (我是一个Python新手,很明显......)

示例:

file1:
foo1
bar1

file2:
foo2
bar2

file3:
foo3
bar3

函数调用:

names = ["file1", "file2", "file3"]
myfct(names)

期望的输出:

foo1
foo2
foo3

bar1
bar2
bar3

我就是这样做的,但我确信有更优雅的Pythonic方式:

def myfct(files):
    file_handlers = []
    for myfile in files:
        file_handlers.append(open(myfile))
    while True:
        done = False
        for handler in file_handlers:
            line = handler.readline()
            eof = len(line) == 0 # wrong
            if (eof):
                done = True
                break
            print(line, end = "")
        print()
        if done == True:
            break

P.S。:我正在使用Python 2.6和from __future__ import print_function

3 个答案:

答案 0 :(得分:8)

for lines in itertools.izip(*file_handlers):
  sys.stdout.write(''.join(lines))

答案 1 :(得分:4)

> cat foo
foo 1
foo 2
foo 3
foo 4
> cat bar
bar 1
bar 2
> cat interleave.py 
from itertools import izip_longest
from contextlib import nested

with nested(open('foo'), open('bar')) as (foo, bar):
    for line in (line for pair in izip_longest(foo, bar)
                      for line in pair if line):
        print line.strip()
> python interleave.py 
foo 1
bar 1
foo 2
bar 2
foo 3
foo 4

与其他答案相比:

  • 文件在退出时关闭
  • 当一个文件停止时,izip_longest不会停止
  • 有效使用内存

或者,对于多个文件(filenames是文件列表):

with nested(*(open(file) for file in filenames)) as handles:
    for line in (line for tuple in izip_longest(*handles)
                      for line in tuple if line):
        print line.strip()

答案 2 :(得分:1)

如果你的所有文件都有相同的行数,或者你想在文件耗尽时立即停止,那么Ignacio的答案就是完美的。但是,如果要支持不同长度的文件,则应使用itertools文档中的“循环法”配方:

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).next for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

sys.stdout.writelines(roundrobin(*file_handlers))