迭代多个JSON词典

时间:2017-01-09 06:34:08

标签: python json dictionary for-loop iteration

我有一个包含多个词典的JSON文件;每个都有很多关于特定网站的信息。我想编写一个程序,它可以遍历字典并严格输出每个字典中的HTML代码,这些代码被发现(解析)为data["p80"]["http"]["get"]["body"]

以下是JSON文件中两个词典的示例。

{"p80":{"http":{"get":{"body": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n\t<head>\n\t\t<title>Motormax</title>\n                    <meta name=viewport content=\"width=device-width, initial-scale=1.0\" />\r\n<meta name=\"google-site-verification\" content=\"wqSGgrJPlLskInflNQPXn9oY25etuJYuRQonZ0k0I_o\" />\r\n<link href='https://fonts.googleapis.com/css?family=Lato:400,700,900' rel='stylesheet' type='text/css'>\r\n        \t\t<meta name=\"description\" content=\"\" /> \n\t\t<meta name=\"keywords\" content=\"Motormaax, Renault, Chevrolet, Nissan, Peugeot, Volkswagen, Ford, Planes de ahorro, financiaci\u00f3n, cuotas, autos en cuotas\" /> \n\t\t<meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" />\n\t\t\n        <script src=\"/processedjs/kms427.js\" type=\"text/javascript\"></script>        <link rel=\"stylesheet\" type=\"text/css\" href=\"/processedcss/kms427.css\" />\n\t\t\n\t\t<script type=\"text/javascript\">\n\t\t\tvar dataLayer = [];\n\t\t</script>\n        <script type=\"text/javascript\">(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':\r\nnew Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],\r\nj=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=\r\n'//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);\r\n})(window,document,'script','dataLayer','GTM-582XL3');</script>\n\t\t\n\t\t\n\n\n\t</head>\n\t<body>\n\t<div style=\"visibility: hidden; display: none;\"></div>\r\n<div class=\"main\">\r\n\t\t\t<p><img src=\"/templatepagina/template_246/images/logo_motormax.png\" alt=\"Motormax\" /></p>\r\n\t\t\t<h1>TE ACOMPA\u00d1AMOS EN LA COMPRA DE TU <b>NUEVO AUTO</b></h1>\r\n\t\t\t<p id=\"line\"></p>\r\n\t\t\t\r\n\t\r\n<ul class=\"marcas\">\t\t\t\r\n<a href=\"/peugeot\"><li id=\"peugeot\"><p>Peugeot</p></li></a>\r\n\t\t\t\t<a href=\"/fiat\"><li id=\"fiat\"><p>fiat</p></li></a>\r\n\t\t\t\t<a href=\"/ford\"><li id=\"ford\"><p>ford</p></li></a>\r\n\t\t\t\t<a href=\"/renault\"><li id=\"renault\"><p>renault</p></li></a>\r\n                                <a href=\"/volkswagen\"><li id=\"vw\"><p>vw</p></li></a>\r\n\t\t\t\r\n\t\r\n\t\t\t\t<!-- <li id=\"nissan\"><p>nissan</p></li> -->\r\n\t\t\t</ul>\r\n\t\t</div>\t\r\n</body>\n</html>", "body_sha256": "fEHZCw9VEdmwVabOd0g8TntigYiA9AsL+sKicdipejU=", "headers": {"cache_control": "post-check=0, pre-check=0", "content_length": "2118", "content_type": "text/html; charset=UTF-8", "expires": "Thu, 19 Nov 1981 08:52:00 GMT", "pragma": "no-cache", "server": "Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16", "unknown": [{"key": "date", "value": "Mon, 07 Nov 2016 16:36:25 GMT"}], "x_powered_by": "PHP/5.4.16"}, "metadata": {"description": "Apache httpd 2.4.6", "manufacturer": "Apache", "product": "httpd", "version": "2.4.6"}, "status_code": 200, "status_line": "200 OK", "title": "Motormax", "timestamp":"2016-11-09 12:28:36"}}}}
{"p80":{"http":{"get":{"body": " \n<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n\"http://www.w3.org/TR/html4/loose.dtd\">\n<html>\n<head>\n<title>Kody pocztowe - wyszukiwarka</title>\n<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=iso-8859-2\">\n<META NAME=\"Keywords\" CONTENT=\"kody pocztowe, kod pocztowy, Poczta Polska, przesy\ufffdki, listy\">\n<META NAME=\"Description\" CONTENT=\"Na tej stronie mo\ufffdesz wyszuka\ufffd kody pocztowe dowolnych miejscowo\ufffdci w Polsce. Podaj miasto, ulic\ufffd i znajd\ufffd potrzebny Ci kod pocztowy. Jest on niezb\ufffddny, je\ufffdli list lub inna przesy\ufffdka ma dotrze\ufffd do adresata na terenie Polski.\">\n<META HTTP-EQUIV=\"Content-Language\" CONTENT=\"PL\">\n<META NAME=\"distribution\" CONTENT=\"Global\">\n<META NAME=\"revisit-after\" CONTENT=\"2 days\">\n<META NAME=\"robots\" CONTENT=\"INDEX,FOLLOW\">\n<style type=\"text/css\">body, td {\nfont-family:arial;\nfont-size:12px;\nmargin:10px 0 10px 0;\ncolor:#000000;\n}\n\n.row { padding: 4px 10px 4px 0; text-align:left}\ninput { }\nimg { border:0;}\n.thead {\ncolor:#FFFFFF; font-size:10px;\nbackground-image:url(http://00-000.pl/gfx/lay/box_top_bg.gif);\npadding:0;\n}\n.pltd{\npadding-right:40px;\ntext-align:right;\nbackground-image:url(http://00-000.pl/gfx/lay/box_bg.gif);\n\ncolor:#000000;\nfont-family:arial;\nfont-size:13px;\nfont-weight:bold;\n}\n.zera{\ncolor:#f26624;\nfont-family:arial;\nfont-size:30px;\n}\n.zeras{\ncolor:#f26624;\nfont-family:arial;\nfont-size:20px;\n}\n.top_right{\nbackground-image:url(http://00-000.pl/gfx/top_bg.gif);\ntext-align:right;\nwidth:auto;\ncolor:#FFFFFF; font-weight:bold; padding-right:20px;}\n.top_bar{\nbackground-color:#eeeeee;\npadding:0 0px 0 8px;\nfont-size:10px;\n\n}\n\na:link{\ntext-decoration:underline;\ncolor:#000000;\n}\na:visited{\ntext-decoration:underline;\ncolor:#000000;\n}\na:hover{ color:#FF0000;\ntext-decoration:none;\n}\na:link.white{\ncolor:#ffffff;\ntext-decoration:none;\n\n}\na:visited.white{\ncolor:#ffffff;\ntext-decoration:none;\n\n}\na:hover.white{ color:#FF3300;\ntext-decoration:underline;\n\n}\n\na:link.head{\ncolor:#ffffff;\ntext-decoration:none;\nfont-weight:bold;\n}\na:visited.head{\ncolor:#ffffff;\ntext-decoration:none;\nfont-weight:bold;\n}\na:hover.head{ color:#FFFF00;\ntext-decoration:underline;\nfont-weight:bold;\n}\n\nli {\nlist-style-type:square;\nlist-style-position:inside;\n}\nh1{\nfont-family:arial;\nfont-size:25px;\nmargin:0 0 5px 0;\n}\nh3{\nfont-size:15px;\ncolor:#993300;\nmargin:0 0 10px 0;\npadding:0;\n\n}\na:link.linkbox{\ncolor:#009900;\ntext-decoration:none;\n}\na:visited.linkbox{\ncolor:#009900;\ntext-decoration:none;\n}\na:hover.linkbox{\ncolor:#009900;\ntext-decoration:underline;\n}\n\n\n.top_box_orange {\nbackground-image:url(http://00-000.pl/gfx/lay/box_top_bg_orange.gif);\nborder-bottom:1px solid #ffffff; \nfont-weight:bold; padding-left:9px;\nheight:21px;\ncolor:#FFFFFF;\n}\n.top_box_grey {\nbackground-image:url(http://00-000.pl/gfx/lay/box_top_bg_grey.gif);\nborder-bottom:1px solid #ffffff; \nfont-weight:bold; height:21px; padding-left:9px;\ncolor:#FFFFFF;\n}\n.top_box_grey_k {background-color:#999999;\nborder-bottom:1px solid #ffffff; \nfont-weight:bold; height:21px; padding-left:9px;\ncolor:#FFFFFF;\n}\n\n.box{\nbackground-image:url(http://00-000.pl/gfx/lay/box_bg.gif);\npadding:15px 10px 20px 10px;\nline-height:15px\n}\n\n.form_ok {\nmargin:10px 0 10px 0;\nbackground-color:#FFFFCC;\ncolor:#99CC00;\nfont-size:14px;\nfont-weight:bold;\npadding:20px;\ntext-align:left;\nborder: 1px solid #009900;\n}\n.form_bad {\nmargin:10px 0 10px 0;\nbackground-color:#FFFFCC;\ncolor:#CC0000;\nfont-size:14px;\nfont-weight:bold;\npadding:20px;\ntext-align:left;\nborder: 1px solid #990000;\n}\n\na.button {\ndisplay:block;\nbackground-color:#f26623;\ncolor:#fff;\npadding:5px 10px;\n width:150px;\nmargin:0 10px 0 10px;\nfloat:right;\ntext-align:center;\ntext-decoration:none;\n}\na:visited.button { color:#fff;}\na:hover.button {\ntext-decoration:underline;\ncolor:#000;\n\n}\n</style>\n</head>\n<body>\n\n<table cellpadding=\"0\" cellspacing=\"0\" width=\"80%\" align=\"center\" >\n<tr><td align=\"left\" width=\"190\" colspan=\"2\"><a href=\"http://00-000.pl\"><img src=\"http://00-000.pl/gfx/logo.gif\" border=\"0\" width=\"190\" height=\"70\"></a></td>\n<td width=\"100%\" class=\"top_right\" colspan=\"2\">wyszukiwarka kod\ufffdw pocztowych</Td>\n<td width=\"4\"><img src=\"http://00-000.pl/gfx/top_right.gif\" border=\"0\" width=\"4\" height=\"70\"></td>\n</tr>\n\n<tr>\n<td width=\"4\"><img src=\"http://00-000.pl/gfx/lay/top_bar_left.gif\" border=\"0\" width=\"4\" height=\"21\"></td>\n<Td width=\"186\" class=\"top_bar\">Ostatnia aktualizacja: ", "body_sha256": "/OYNeyTKqqDQNpmG1rmKfK8OYAKfUDP1l8jGUnVlyR8="}}}}

到目前为止,这是我的代码。

import json
from pprint import pprint
import sys

if __name__ == "__main__":
    file = open('sample101.json', 'r')

    for dict in file:
        for key, value in file.items():
            pprint(file["p80"]["http"]["get"]["body"])

    file.close()

任何帮助都会非常感激,因为我是Python新手。非常感谢你!

2 个答案:

答案 0 :(得分:1)

json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

  

反序列化fp(.read() - 支持包含文件的文件对象   JSON文档)使用此转换表的Python对象。

file = open('sample101.json', 'r')
py_dict = json.load(file)

答案 1 :(得分:0)

如果我有这个权利,你有一个json文件,其中包含一个字典列表,你想从字典中提取html。在这种情况下,您需要将整个文件解析为json,然后提取很简单。不要为变量dict命名,因为它会掩盖内置的dict类,否则应该这样做。

import json
from pprint import pprint
import sys

if __name__ == "__main__":
    for data_dict in json.load(open('sample101.json', encoding='utf-8')):
        pprint(data_dict["p80"]["http"]["get"]["body"])

如果您担心数据不好,可以将其全部包装在try / except块中,并一次抓取一个项目。

for data_dict in json.load(open('sample101.json', encoding='utf-8')):
    for key in "p80", "http", "get", "body":
        try:
            data_dict = data_dict[key]
        except (TypeError, KeyError):
            print("Error at", key)
            print(repr(data_dict))
            raise  # or remove to continue with next item

<强>更新

假设它不是json文件,而是每行有一个json字符串的文件。然后我们稍微重做一下循环(并停止称它为xxx.json!)。

for line in open('sample101.json', encoding='utf-8'):
    data_dict = json.loads(line):
    for key in "p80", "http", "get", "body":
        try:
            data_dict = data_dict[key]
        except (TypeError, KeyError):
            print("Error at", key)
            print(repr(data_dict))
            raise  # or remove to continue with next item
相关问题