通过删除转义字符格式化从URL获取的json数据

时间:2017-01-25 06:56:36

标签: python json

我从url获取了json数据并将其写入文件名urljson.json 我想格式化json数据删除'\'和result []键以满足要求 在我的json文件中,数据的排列方式如下

{\"result\":[{\"BldgID\":\"1006AVE \",\"BldgName\":\"100-6th Avenue SW (Oddfellows)          \",\"BldgCity\":\"Calgary             \",\"BldgState\":\"AB \",\"BldgZip\":\"T2G 2C4  \",\"BldgAddress1\":\"100-6th Avenue Southwest                \",\"BldgAddress2\":\"ZZZ None\",\"BldgPhone\":\"4035439600     \",\"BldgLandlord\":\"1006AV\",\"BldgLandlordName\":\"100-6 TH Avenue SW Inc.                                     \",\"BldgManager\":\"AVANDE\",\"BldgManagerName\":\"Alyssa Van de Vorst           \",\"BldgManagerType\":\"Internal\",\"BldgGLA\":\"34242\",\"BldgEntityID\":\"1006AVE \",\"BldgInactive\":\"N\",\"BldgPropType\":\"ZZZ None\",\"BldgPropTypeDesc\":\"ZZZ None\",\"BldgPropSubType\":\"ZZZ None\",\"BldgPropSubTypeDesc\":\"ZZZ None\",\"BldgRetailFlag\":\"N\",\"BldgEntityType\":\"REIT                     \",\"BldgCityName\":\"Calgary             \",\"BldgDistrictName\":\"Downtown            \",\"BldgRegionName\":\"Western Canada                                    \",\"BldgAccountantID\":\"KKAUN     \",\"BldgAccountantName\":\"Kendra Kaun                   \",\"BldgAccountantMgrID\":\"LVALIANT  \",\"BldgAccountantMgrName\":\"Lorretta Valiant                        \",\"BldgFASBStartDate\":\"2012-10-24\",\"BldgFASBStartDateStr\":\"2012-10-24\"}]}

我希望它像这种格式

[  
   {  
      "BldgID":"1006AVE",
      "BldgName":"100-6th Avenue SW (Oddfellows)          ",
      "BldgCity":"Calgary             ",
      "BldgState":"AB ",
      "BldgZip":"T2G 2C4  ",
      "BldgAddress1":"100-6th Avenue Southwest                ",
      "BldgAddress2":"ZZZ None",
      "BldgPhone":"4035439600     ",
      "BldgLandlord":"1006AV",
      "BldgLandlordName":"100-6 TH Avenue SW Inc.                                    ",
      "BldgManager":"AVANDE",
      "BldgManagerName":"Alyssa Van de Vorst           ",
      "BldgManagerType":"Internal",
      "BldgGLA":"34242",
      "BldgEntityID":"1006AVE ",
      "BldgInactive":"N",
      "BldgPropType":"ZZZ None",
      "BldgPropTypeDesc":"ZZZ None",
      "BldgPropSubType":"ZZZ None",
      "BldgPropSubTypeDesc":"ZZZ None",
      "BldgRetailFlag":"N",
      "BldgEntityType":"REIT                     ",
      "BldgCityName":"Calgary             ",
      "BldgDistrictName":"Downtown            ",
      "BldgRegionName":"Western Canada                                    ",
      "BldgAccountantID":"KKAUN     ",
      "BldgAccountantName":"Kendra Kaun                   ",
      "BldgAccountantMgrID":"LVALIANT  ",
      "BldgAccountantMgrName\":"      Lorretta Valiant                        ",
      "BldgFASBStartDate":"2012-10-24",
      "BldgFASBStartDateStr":"2012-10-24"
   }   `
]

我尝试过替换(“\”,“”)但没有改变 这是我的代码

import json


import urllib2
urllink=urllib2.urlopen("url").read()

print urllink -commented out



with open('urljson.json','w')as outfile:
    json.dump(urllink,outfile)


jsonfile='urljson.json'
jsondata=open(jsonfile)

data=json.load(jsondata)
data.replace('\'," ") --commented out
print (data)

但是它说fileobject没有替换属性,我没有找到任何想法如何删除'结果'和大多数外部“{}” 请指导我 我认为文件对象不会以某种方式在字符串中解析。我是python中的初学者 谢谢

3 个答案:

答案 0 :(得分:1)

JSON是数据的序列化编码。 urllink=urllib2.urlopen("url").read()读取该序列化字符串。使用json.dump(urllink,outfile),您再次序列化了该单个序列化JSON字符串。你对它进行了双重编码,这就是为什么你会看到那些额外的“\”转义字符。 json需要转义这些字符,以免将它们与用来标记字符串的引号混淆。

如果您希望文件保存原始json,则不需要再次编码,只需执行

with open('urljson.json','w')as outfile:
    outfile.write(urllink)

但看起来你想抓住“结果”列表而只保存它。因此,将JSON解码为python,抓取你想要的位,然后重新编码。

import json
import codecs
import urllib2

# read a json string from url
urllink=urllib2.urlopen("url").read()

# decode and grab result list
result = json.loads(urllink)['result']

# write the json to a file
with open('urljson.json','w')as outfile:
    json.dump(result, outfile)

答案 1 :(得分:0)

在将JSON对象写入文件之前对其进行整理。它有很多空白噪音。试试这样:

urllink = {a.strip():b.strip() for a,b in json.loads(urllink).values()[0][0].items()}
jsonobj = json.loads(json.dumps(urllink))

with open('urljson.json','w') as outfile:
    json.dump(jsonobj, outfile)

对于所有对象:

jsonlist = []

for dirtyobj in json.loads(urllink)['result']:
     jsonlist.append(json.loads(json.dumps({a.strip():b.strip() for a,b in dirtyobj.items()})))

with open('urljson.json','w') as outfile:
    json.dump(json.loads(json.dumps(jsonlist)), outfile)

不想整理好吗?然后只需这样做:

jsonobj = json.loads(urllink)

你不能'\',它的语法错误。第二个'已转义,不会被视为结束报价。

data.replace('\'," ")

Why can't Python's raw string literals end with a single backslash?

答案 2 :(得分:0)

\是json中的转义字符:

enter image description here

你可以将json字符串加载到python dict: enter image description here