将文本文件读入Python中的字典列表

时间:2013-09-23 23:52:05

标签: python file dictionary

我已经搜索了一段时间,但还没有看到一个简单的答案。

我有一个非常结构化的txt文件,其中包含很多这样的元素:

product/productId: B000GKXY4S
review/userId: A1QA985ULVCQOB
review/profileName: Carleen M. Amadio "Lady Dragonfly"
review/helpfulness: 2/2
review/score: 5.0
review/time: 1314057600
review/summary: Fun for adults too!
review/text: I really enjoy these scissors for my inspiration books that I am making (like collage, but in books) and using these different textures these give is just wonderful, makes a great statement with the pictures and sayings. Want more, perfect for any need you have even for gifts as well. Pretty cool!

product/productId: B000GKXY4S
review/userId: ALCX2ELNHLQA7
review/profileName: Barbara
review/helpfulness: 0/0
review/score: 5.0
review/time: 1328659200
review/summary: Making the cut!
review/text: Looked all over in art supply and other stores for "crazy cutting" scissors for my 4-year old grandson.  These are exactly what I was looking for - fun, very well made, metal rather than plastic blades (so they actually do a good job of cutting paper), safe ("blunt") ends, etc.  (These really are for age 4 and up, not younger.)  Very high quality.  Very pleased with the product.

product/productId: B000140KIW
review/userId: A2M2M4R1KG5WOL
review/profileName: L. Heminway
review/helpfulness: 1/1
review/score: 5.0
review/time: 1156636800
review/summary: Fiskars Softouch Multi-Purpose Scissors, 10"
review/text: These are the BEST scissors I have ever owned.  I am left-handed and take note that either a left or right-handed person can use these equally well.               If you have arthritis, as I do, these scissors are amazing as well.  Well worth the price.  I now own three pairs of these and have convinced many other people in my quilting group that they NEED a pair as well!             They cut through muli layers and difficult to cut items really well.            Do buy them, you won't regret it!

这将是一个字典,我想要一个这样的字典列表。最简单的方法是什么?我试过csv,但似乎不正确:

field = ("product/productId", "review/userId", "review/profileName", "review/helpfulness",
              "review/score","review/time", "review/summary", "review/text")

reader = csv.DictReader(open('../Arts.txt'), fieldnames=field)

有人可以帮我解决这个新手问题吗?谢谢!

2 个答案:

答案 0 :(得分:3)

在这种情况下,您只想读取每一行,在:上拆分以获取键和值,然后将该对添加到当前字典中。由于您的文件结构良好,您只需通过字段名称检测新块的开始时间:

data = []
current = {}
with open('../Arts.txt') as f:
    for line in f:
        pair = line.split(': ', 1)
        if len(pair) == 2:
            if pair[0] == 'product/productId' and current:
                # start of a new block
                data.append(current)
                current = {}
            current[pair[0]] = pair[1]
    if current:
        data.append(current)

如果您的文件包含多个列,则可以使用csv,例如,具有相同数据的csv文件可能如下所示:

product/productId,review/userId,review/profileName,...
B000GKXY4S,A1QA985ULVCQOB,Carleen M. Amadio "Lady Dragonfly",...
B000GKXY4S,ALCX2ELNHLQA7,Barbara,...

答案 1 :(得分:1)

我很惊讶csv阅读器不起作用,也许你做了一些意外的读者。

节省大量词典并不是一个好用法。相反,在集合中有一个名为namedtuple的内置“不可变dict”,它更便宜且易于使用。

这实际上可以通过简单地一次读取一行常量(在这种情况下,8行+ 1个空行)来解决:

from collections import namedtuple
data_point = namedtuple('data_point', field)

data_lst = list()
with open('some_path/somefile.txt') as f_in:
    while True:
        data = [f_in.readline().strip().split(':')[1] for range(8)]
        if sum([len(ele) for ele in data]) == 0:
            break
        data_lst.append(data_point(data))
        f_in.readline()

人们习惯于在python中循环,他们忘记了while循环的存在。

如果您在问题中显示的内容并未在整个文件中保留,则数字8可能会有所不同。在这种情况下,您应该花费读取行的for循环并检查条件。在这里,我正在利用干净的数据集。

此外,更改字段,使其不包含“/”或其他特殊字符。只要保留它们的顺序,字段的名称就没那么重要了。