Question

我有一个像输入这样的大文件，每4行对应相同的ID，即以@开头的行。第二行（在@之后）是一系列字符，对于某些ID，我们没有这一行。如果是这种情况，我想删除所有属于同一ID的4行我也在python中尝试了下面的代码并给出了错误。

输入：

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG
@M00872:361:000000000-D2GK2:1:1101:16217:1352 1:N:0:1

+

输出：

@M00872:361:000000000-D2GK2:1:1101:16003:1351 1:N:0:1
ATCCGGCTCGGAGGA
+
1AA?ADDDADDAGGG
@M00872:361:000000000-D2GK2:1:1101:15326:1352 1:N:0:1
GCGCAGCGGAAGCGTGCTGGG
+
CCCCBCDCCCCCGGEGGGGGG


import fileinput

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f:
    for l in f:
        if l.strip().startswith("@"):
            c = 2
            next_line = f.readline().strip()  
            if not next_line:   
                while c:        
                    c -= 1
                    try:
                        next(f)
                    except StopIteration:
                        break
            else:
                print(l.strip())
                print(next_line.strip())
                while c:
                    c -= 1
                    try:
                        print(next(f).strip())
                    except StopIteration:
                        break

但没有奏效并发出此错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: FileInput instance has no attribute '__exit__'

你知道如何解决这个问题吗？

Answer 1

如果你想在fileinput.FileInput语句中使用它，__exit__()类似乎没有实现with fileinput.input()..方法。

Answer 2

我认为问题是python版本（2.7）不支持fileinput到with

使用

f = fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak")

相反

with fileinput.input(files="4415_pool.fastq", inplace=True, backup="file.bak") as f

Answer 3

尽管在2.5中添加了语句，但我认为fileinput没有被移植到使用它（contextlib？）。

您的代码将在python3中运行，但在2.7中不运行。要解决此问题，请使用py3或移植代码来迭代以下行：

   with open(filename, "r") as f:
         lines = f.readlines()

   for line in lines: 
        #do whatever you need to do for each line.

Answer 4

作为你问题的解决方案（在2.7中），我会做类似的事情：

# Read all the lines in a buffer
with open('input.fastq', 'r') as source:
  source_buff = iter(source.readlines())

with open('output.fastq', 'w') as out_file:
  for line in source_buff:
    if line.strip().startswith('@'):
      prev_line = line
      line = next(source_buff)

      if line.strip():
        # if the 2nd line is not empty write the whole block in the output file
        out_file.write(prev_line)
        out_file.write(line)
        out_file.write(next(source_buff))
        out_file.write(next(source_buff))
      else:
        pass

我知道.fastq文件有时会非常大，所以我建议不要在缓冲区中读取整个文件，而是将这段代码放在一个循环中，在这个循环中你读取4行（或者你的块的行数是多少）时间。

编辑文本文件

4 个答案: