python解析函数

时间:2017-07-14 15:34:45

标签: python json function parsing

我是Python的新手,在运行代码时遇到了一些错误。

我有这个亚马逊数据集,格式化为JSON文件 (请参阅下面的json格式)。

{
  "reviewerID": "A2SUAM1J3GNN3B",
  "asin": "0000013714",
  "reviewerName": "J. McDonald",
  "helpful": [2, 3],
  "reviewText": "I bought this for my husband who plays the piano.  He is 
having a wonderful time playing these old hymns.  The music  is at times 
hard to read because we think the book was published for singing from more 
than playing from.  Great purchase though!",
  "overall": 5.0,
  "summary": "Heavenly Highway Hymns",
  "unixReviewTime": 1252800000,
  "reviewTime": "09 13, 2009"
}

我正在使用的命令由数据发送者提供,它将上面的JSON文件转换为'strict json'文件(原始JSON文件不是基于数据发送者的严格json)。

他们提供的命令如下:

import json
import gzip

def parse(path):
  g = gzip.open(path, 'r')
  for l in g:
    yield json.dumps(eval(l))

f = open("output.strict", 'w')
for l in parse("reviews_Video_Games.json.gz"):
  f.write(l + '\n')

我只更改了路径,将JSON文件的目录放在引号中(例如,“C:\ Users \ daisy \ Research \ study \ _Aax \ reviews_Video_Games.json.gz”)

例如,我运行的代码如下所示:

import json
import gzip

def parse(C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz):
  g = gzip.open(C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz, 'r')
  for l in g:
    yield json.dumps(eval(l))

f = open("output.strict", 'w')
for l in parse("reviews_Video_Games.json.gz"):
  f.write(l + '\n')

但是,我收到以下错误:

C:\Users\daisy\AppData\Local\Programs\Python\Python36-32>python C:\Users\daisy\AppData\Local\Programs\Python\strict_json.py
  File "C:\Users\daisy\AppData\Local\Programs\Python\strict_json.py", line 4
def parse("C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"):
                                                                                ^
SyntaxError: invalid syntax

你知道语法有什么问题吗?

原始代码再次由数据发送者提供,因此我非常确定代码是正确的。当我将'path'更改为我的文件目录时,我认为我做错了。

谢谢。

1 个答案:

答案 0 :(得分:0)

您无法定义类似的功能。

def parse(file_path):
  g = gzip.open(file_path, 'r')
  for l in g:
    yield json.dumps(eval(l))

parse(r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz")

虽然您可以像这样设置默认值:

def parse(file_path=r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"):
  g = gzip.open(file_path, 'r')
  for l in g:
    yield json.dumps(eval(l))

parse()

编码问题更新

>>> "C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
>>> "C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz"
'C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz'
>>> r"C:\Users\daisy\Research\study\Amazon\reviews_Video_Games.json.gz"
'C:\\Users\\daisy\\Research\\study\\Amazon\\reviews_Video_Games.json.gz'
相关问题