Question

目前我正在使用以下python脚本：

import json
from collections import defaultdict
from pprint import pprint

with open('prettyPrint.txt') as data_file:
    data = json.load(data_file)

locations = defaultdict(list)


for item in data['data']:
    location = item['relationships']['location']['data']['id']
    locations[location].append(item['id'])

pprint(locations)

解析一些脏的json数据，如下所示：

{
    "links": {
        "self": "http://localhost:2510/api/v2/jobs?skills=data%20science"
    },
    "data": [
        {
            "id": 121,
            "type": "job",
            "attributes": {
                "title": "Data Scientist",
                "date": "2014-01-22T15:25:00.000Z",
                "description": "Data scientists are in increasingly high demand amongst tech companies in London. Generally a combination of business acumen and technical skills are sought. Big data experience ..."
            },
            "relationships": {
                "location": {
                    "links": {
                        "self": "http://localhost:2510/api/v2/jobs/121/location"
                    },
                    "data": {
                        "type": "location",
                        "id": 3
                    }
                },
                "country": {
                    "links": {
                        "self": "http://localhost:2510/api/v2/jobs/121/country"
                    },
                    "data": {
                        "type": "country",
                        "id": 1
                    }
                },

此时输出是这样的：

         85: [36026,
              36028,
              36032,
              36027,
              217897,
              286398,
              315064,
              320879,
              322303,
              322608,
              322611,
              323199,
              325659,
              327652],
         88: [13690,
              13693,
              13689,
              13692,
              13691,
              16454,
              16453,
              28002,
              28003,
              28004,
              28001,
              114667,
              233319,
              233329,
              263814,
              271490,
              271571,
              271569,
              271570,
              291274,
              291275,
              300376,
              300373,
              301293,
              301295,
              304286,
              304285,
              320425,
              320426,
              320424,
              320431,
              320430,
              321284,
              321281,
              321283,
              321282,
              321280,
              324345,
              327926,
              347985,
              358537,
              358549,
              357807,
              364541,
              358431,
              334990,
              359241],

但是我想改变它，以便输出看起来像这样：

我知道我需要在某个地方放置某种i=0，i++ - 但我无法弄明白 - 如何做到这一点？

Answer 1

您只需要dict中的项目计数，而不是locations dict中实际项目的计数。将int与defaultdict一起使用为：

locations = defaultdict(int)
# makes default value of each key as `0`

并将for循环设为：

for item in data['data']:
    location = item['relationships']['location']['data']['id']
    locations[location] += 1   # increase the count by `1`

或者，使用collections.Counter()和生成器表达式更好，如@ TigerhawkT3所述：

from collections import Counter

Counter(item['relationships']['location']['data'‌]['id'] for item in data['data'])

在python解析脚本中为循环添加增量计数器

1 个答案: