Question

我有BIG数据文本文件，例如：

#01textline1
1 2 3 4 5 6
2 3 5 6 7 3
3 5 6 7 6 4
4 6 7 8 9 9

1 2 3 6 4 7
3 5 7 7 8 4
4 6 6 7 8 5

3 4 5 6 7 8
4 6 7 8 8 9
..
..

我想在空行之间提取数据并将其写入新文件中。很难知道文件中有多少空行（意味着你也不知道你要编写多少个新文件;因此编写新文件似乎很难，因为你不知道你要编写多少个新文件。可以有谁引导我？谢谢。我希望我的问题很明确。

Answer 1

除非您的文件非常大，否则使用re将所有文件拆分为单独的部分，拆分2个或更多空格字符

import re
with open("in.txt") as f:
    lines = re.split("\s{2,}",f.read())
    print lines
['#01textline1\n1 2 3 4 5 6\n2 3 5 6 7 3\n3 5 6 7 6 4\n4 6 7 8 9 9', '1 2 3 6 4 7\n3 5 7 7 8 4\n4 6 6 7 8 5', '3 4 5 6 7 8\n4 6 7 8 8 9']

迭代遍历行并在每次迭代时编写新文件

Answer 2

读取文件不是data-mining。请选择更合适的标签......

在空行上拆分文件很简单：

num = 0
out = open("file-0", "w")

for line in open("file"):
    if line == "\n":
      num = num + 1
      out.close()
      out = open("file-"+num, "w")
      continue
    out.write(line)

out.close()

由于此方法一次仅读取一行，因此文件大小无关紧要。它应该以您的磁盘可以处理数据的速度处理数据，并使用几乎恒定的内存。

Perl会有一个巧妙的技巧，因为您可以通过$/="\n\n";将输入记录分隔符设置为两个换行符，然后像往常一样处理一个记录的数据...我找不到类似的东西蟒蛇;但是＃＆＃34;分裂为空行＆＃34;也不错。

Answer 3

这是一个开始：

with open('in_file') as input_file:
    processing = False
    i = 0
    for line in input_file:
        if line.strip() and not processing:
            out_file = open('output - {}'.format(i), 'w')
            out_file.write(line)
            processing = True
            i += 1
        elif line.strip():
            out_file.write(line)
        else:
            processing = False
            out_file.close()

此代码使用processing标记跟踪当前是否正在写入文件。当它看到一个空行时，它会重置标志。该代码还会在看到空行时创建一个新文件。

希望它有所帮助。

读取数据文件的空白空间之间的行并写入新文件

3 个答案: