Question

我正在尝试学习python，我想编写一个文本解析器。我尝试解析一个充满了dna字符串的大型fasta文件（长度为136275行，大小为9.8MB）。我的问题是该程序始终在精确的位置停止工作（行16076），并且不会引发错误。

def file_parser(filepath):
  data = []
  file_content = open(filepath, 'r')
  line = file_content.readline()
  i=0
  while line:
    if line == 0:
      break
    elif line[0] == ">":
      key, name = line.split('|')[-2:]
      dna = ''
      line = file_content.readline()
      i = i+1
      while not line.startswith('>'): #line[0] != ">": #
        dna = dna + line
        line = file_content.readline()
      dna = dna.rstrip('\n')
      name = name.rstrip('\n')
      row = {
        key, 
        name, 
        dna
      }
      data.append(row)
      print(i)
    else:
      print("Your file is corrupted")
  return data

因此，我的问题是（作为编写python的初学者）我的代码停止运行有什么问题？我认为它可能是line.startswith('>')，因为我改用了它，因为之前我有一些字符串索引超出范围错误，但是老实说，我不太确定。

我的测试文件来自以下来源：ftp://ftp.ncbi.nih.gov/genomes/Acanthisitta_chloris/protein/ （其.fa.gz文件）我使用稍微自定义的Ubuntu 18.10和python3。

感谢您的时间。

用startswith（）在Python中解析大文本文件的问题

0 个答案: