Python从文本文件中读取特定数据

时间:2016-11-16 01:32:09

标签: python

我正在努力去抓住这个。我需要为每个评论创建一个pandas DataFrame对象,其中包含以下条目:

  • 产品ID
  • 投票赞成该评论的人数
  • 对此评论进行评分的总人数
  • 产品评级
  • 评论文字

如果有人甚至可以帮我开始如何打印每个产品/产品ID系列,那将不胜感激。

以下是我的文本文件示例:(抱歉,当我在此网站上输入时,我不知道如何正确格式化它)

product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.

product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".

product/productId: B000LQOCH0
review/userId: ABXLMWJIXXAIN
review/profileName: Natalia Corres "Natalia Corres"
review/helpfulness: 1/1
review/score: 4.0
review/time: 1219017600
review/summary: "Delight" says it all
review/text: This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.

2 个答案:

答案 0 :(得分:1)

如果我理解你的问题,我相信你想从你写的结构文件中读取。您可以使用以下代码创建一个数组,每个评论都是字典:

#Opening your file
your_file = open('file.txt')

#Reading every line
reviews = your_file.readlines()

reviews_array = []
dictionary = {}

#We are going through every line and skip it when we see that it's a blank line
for review in reviews:
    this_line = review.split(":")
    if len(this_line) > 1:
        #The blank lines are less than 1 in length after the split
        dictionary[this_line[0]] = this_line[1].strip()
        #Every first part before ":" is the key of the dictionary, and the second part id the content.
    else:
        #If a blank like was found lets save the object in the array and reset it
        #for the next review
        reviews_array.append(dictionary)
        dictionary = {}

#Append the last object because it goes out the last else
reviews_array.append(dictionary)

print(reviews_array)

此代码将打印如下内容:

[
{'review/text': 'I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.', 'review/profileName': 'delmartian', 'review/summary': 'Good Quality Dog Food', 'product/productId': 'B001E4KFG0', 'review/score': '5.0', 'review/time': '1303862400', 'review/helpfulness': '1/1', 'review/userId': 'A3SGXH7AUHU8GW'},
{'review/text': 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".', 'review/profileName': 'dll pa', 'review/summary': 'Not as Advertised', 'product/productId': 'B00813GRG4', 'review/score': '1.0', 'review/time': '1346976000', 'review/helpfulness': '0/0', 'review/userId': 'A1D87F6ZCVE5NK'},
{'review/text': 'bla blas', 'review/profileName': 'Natalia Corres "Natalia Corres"', 'review/summary': '"Delight" says it all', 'product/productId': 'B000LQOCH0', 'review/score': '4.0', 'review/time': '1219017600', 'review/helpfulness': '1/1', 'review/userId': 'ABXLMWJIXXAIN'}
]

您可以像这样访问每个对象:

for r in reviews_array:
    print(r['review/userId'])

然后你会得到这个结果:

A3SGXH7AUHU8GW
A1D87F6ZCVE5NK
ABXLMWJIXXAIN

答案 1 :(得分:0)

这是一个开始,我无法破译您的几个字段/列,因此可能需要更多逻辑和文本按摩。与其他答案类似:将文本解析为字典键:值对 - 使用正则表达式查找对。

this.setState