如何在Python中从文本文件构建字典

时间:2013-02-07 22:29:15

标签: python text dictionary fileparsing

我有一个文本文件,其条目如下所示:

JohnDoe

Assignment 9  
Reading: NO  
header: NO  
HW: NO  
Solutions: 0 
show: NO  
Journals: NO  
free: NO  
Finished: NO  
Quiz: 0  
Done     
Assignment 3  
E-book: NO  
HW: NO  
Readings: NO  
Show: 0  
Journal: NO 
Study: NO  
Test: NO  
Finished: NO  
Quiz: 0  
Done

这是一个小样本。该文件中有几名学生。每个学生在他们的名下有两个作业,只有在每个作业中以“已完成”开头的行为“已完成:是”时才会通过。每个作业下的所有数据都是杂乱无章的,但是在每个作业的某个地方,一行会说“完成:是(或否)”我需要一种方法来阅读文件并说明是否有任何学生通过。到目前为止,我有

def get_entries( file ):
with open( "dicrete.txt.rtf", 'rt') as file:
    for line in file:
        if "Finished" in line:
            finished, answer = line.split(':')
            yield finished, answer

# dict takes a sequence of  `(key, value)` pairs and turns in into a dict
print dict(get_entries( file ))

我只能得到这个代码来返回一个条目(第一个“完成”它作为键读取,“YES或NO”作为值,这是我想要的,但我希望它返回文件中的每一行以“完成”开头。所以我提供的样本数据我想返回一个包含2个条目的字典{完成:“否”,完成:“否”}

2 个答案:

答案 0 :(得分:2)

字典每个键只能存储一个映射。因此,您永远不会拥有一个字典,该字典对同一个键有两个不同的条目。

请考虑使用两元组列表,例如[("Finished", "NO"), ("Finished", "NO")]

答案 1 :(得分:0)

听起来你需要一个更好的数据模型!让我们来看看,我们呢?

我们可以使用AssignmentAssignment: #之间的所有文字行来定义Finished: YES/NO类。

class Assignment(object):
    def __init__(self, id, *args, **kwargs):
        self.id = id
        for key,val in kwargs.items():
            setattr(self, key.lower(), val)
        finished = getattr(self, 'finished', None)
        if finished is None:
            raise AttributeError("All assignments must have a 'finished' value")
        else:
            self.finished = True if finished.lower() == "yes" else False

    @classmethod
    def from_string(cls, s):
        """Builds an Assignment object from a string

        a = Assignment.from_string('''Assignment: 1\nAttributes: Go Here\nFinished: yes''')
        >>> a.id
        1
        >>> a.finished
        True"""
        d = dict()
        id = None
        for line in s.splitlines():
            key,*val = map(str.strip, line.split(":"))
            val = ' '.join(val) or None
            if key.lower().startswith('assignment'):
                id = int(key.split()[-1])
                continue
            d[key.lower()] = val
        if id is not None:
            return cls(id, **d)
        else:
            raise ValueError("No 'Assignment' field in string {}".format(s))

拥有模型后,您需要解析输入。幸运的是,这实际上很简单。

def splitlineson(s, sentinel):
    """splits an iterable of strings into a newline separated string beginning with each sentinel.

    >>> s = ["Garbage", "lines", "SENT$", "first", "group", "SENT$", "second", "group"]
    >>> splitlineson(s, "SENT$")
    iter("SENT$\nfirst\ngroup",
         "SENT$\nsecond\ngroup")"""

    lines = []
    for line in s:
        if line.lower().strip().startswith(sentinel.lower()):
            if any((sentinel.lower() in line.lower() for line in lines)):
                yield "\n".join(lines)
            lines = [line.strip()]
        else:
            if line:
                lines.append(line.strip())
    yield "\n".join(lines)

with open('path/to/textfile.txt') as inf:
    assignments = splitlineson(inf, "assignment ")

assignment_list = [Assignment.from_string(a) for a in assignments]