如果它们不为空,则连接两行

时间:2016-03-09 22:23:25

标签: python string python-3.x dictionary

我想将两行文字合并为一行,但只有当它们都不是空行时才会。例如:

1:1 Bob drives his car.
1:2 Bob and his wife are going on a trip. 
They will have an awesome time on the beach.

我想将它们放入这样的字典中:

dict[1:1] gives me "Bob drives his car."
and dict[1:2] must give me "Bob and his wife are going on a trip.They will have an awesome time on the beach."

我知道如何解决第一个问题(dict[1:1]),但我不知道如何将两个句子放在一起。

或者是否有一个选项,如果一个句子后跟另一个句子,你可以把它们放在一行上?这只是一个实例,该文件包含100000行。

3 个答案:

答案 0 :(得分:1)

您可以这样做 - 从文件中一次读取一行,如果有空行则触发新部分的开头。

start_new_section = True
key = None
output = {}
with open('file.txt', 'r') as f:
    for line in f:
        if line == '':
            start_new_section = True
        elif start_new_section:
            words = line.split(' ')
            key = words[0]
            output[key] = ' '.join(words[1:])
            start_new_section = False
        else:
            output[key] += line

print(output)

或同一想法的更整洁的版本:

key = None
output = {}
with open('file.txt', 'r') as f:
    for line in f:
        if not line:
            key = None
        elif key:
            output[key] += line
        else:
            key, _, output[key] = line.partition(' ')

答案 1 :(得分:0)

解决此问题的一种可能方法是浏览文件一次,并制作一个以数值开头的索引列表。然后你可以使用索引来创建你的字典,因为你知道索引中的每2个数字都包含一个应插入字典的项目。

答案 2 :(得分:0)

假设文件足够小以至于您可以将整个内容读入内存,则可以使用正则表达式来解析块。这是example in action

import re

with open('file.txt', 'r') as f:
    txt = f.read()

matches = re.findall(r'^(\d+:\d+) (.+?)$(?=(?:\s^\d+:\d+)|\z)', txt, flags=re.M | re.S)
d = {m[0]: m[1].replace(r'\n', '') for m in matches}