在其他类似问题上,我看到它们使用ijson之类的库。但是我似乎无法弄清楚如何使用它来解决问题。
这是我到目前为止的代码:
f = urllib.request.urlopen("https://data.medicaid.gov/resource/4qik-skk9.json?$limit=646259")
stuff = ijson.items(f,"")
for items in stuff:
print(items)
这是json结构的样子:
[
{
"package_size_code": "60",
"fda_ther_equiv_code": "NR",
"fda_application_number": "204153",
"clotting_factor_indicator": "N",
"year": "2018",
"fda_product_name": "LUZU Cream 1% 60gm",
"labeler_name": "MEDICIS DERMATOLOGICS, INC.",
"ndc": "99207085060",
"product_code": "0850",
"unit_type": "GM",
"fda_approval_date": "2013-11-14T00:00:00",
"market_date": "2014-03-14T00:00:00",
"pediatric_indicator": "N",
"package_size_intro_date": "2014-03-14T00:00:00",
"units_per_pkg_size": "60000",
"labeler_code": "99207",
"desi_indicator": "1",
"drug_category": "S",
"quarter": "3",
"cod_status": "3"
},
{
"package_size_code": "60",
"fda_ther_equiv_code": "AB",
"fda_application_number": "21758",
"clotting_factor_indicator": "N",
"year": "2018",
"fda_product_name": "VANOS CREAM .1%",
"labeler_name": "MEDICIS DERMATOLOGICS, INC.",
"ndc": "99207052560",
"product_code": "0525",
"unit_type": "GM",
"fda_approval_date": "2005-02-11T00:00:00",
"market_date": "2005-02-21T00:00:00",
"pediatric_indicator": "N",
"package_size_intro_date": "2005-02-21T00:00:00",
"units_per_pkg_size": "60000",
"labeler_code": "99207",
"desi_indicator": "1",
"drug_category": "I",
"quarter": "3",
"cod_status": "3"
},
.
.
.
.
]
我想做的是获取所有结果,并对它们应用一些过滤器以获取值。例如,我试图从所有条目中获得年份的最大值。为此,我认为我需要读取所有数据。
这似乎解决了内存问题:
parser = ijson.parse(urllib.request.urlopen('https://data.medicaid.gov/resource/4qik-skk9.json?$limit=646259'))
for prefix, event, value in parser:
#print(prefix)
print(event)
#print(value)
数据不是很整齐,但是正在进步。