通过多个数组Python3遍历Json对象

时间:2020-06-25 06:34:26

标签: python arrays json ocr

我有一个存储在JSON对象中的OCR数据,看起来像这样:

{
  "language": "ar",
  "textAngle": 0,
  "orientation": "Right",
  "regions": [
    {
      "boundingBox": "191,26,283,265",
      "lines": [
        {
          "boundingBox": "191,26,283,22",
          "words": [
            {
              "boundingBox": "191,28,95,20",
              "text": "KINGDOM",
              "confidence": 861
            },
            {
              "boundingBox": "292,27,26,18",
              "text": "OF",
              "confidence": 826
            },
            {
              "boundingBox": "323,26,64,19",
              "text": "SAUDI",
              "confidence": 840
            },
            {
              "boundingBox": "393,26,81,18",
              "text": "ARABIA",
              "confidence": 765
            }
          ]
        },
        {
          "boundingBox": "215,58,237,20",
          "words": [
            {
              "boundingBox": "215,59,98,19",
              "text": "MINISTRY",
              "confidence": 812
            },
            {
              "boundingBox": "318,58,28,18",
              "text": "OF",
              "confidence": 996
            },
            {
              "boundingBox": "353,58,99,18",
              "text": "INTERIOR",
              "confidence": 713
            }
          ]
        },
        {
          "boundingBox": "243,258,137,33",
          "words": [
            {
              "boundingBox": "243,258,137,33",
              "text": "االل",
              "confidence": 999
            }
          ]
        }
      ]
    },
    {
      "boundingBox": "523,29,230,57",
      "lines": [
        {
          "boundingBox": "545,29,186,30",
          "words": [
            {
              "boundingBox": "545,29,81,30",
              "text": "سياقة",
              "confidence": 999
            },
            {
              "boundingBox": "632,30,99,29",
              "text": "رخصة",
              "confidence": 999
            }
          ]
        },
        {
          "boundingBox": "523,70,230,16",
          "words": [
            {
              "boundingBox": "523,70,107,16",
              "text": "DRIVING",
              "confidence": 679
            },
            {
              "boundingBox": "642,70,111,16",
              "text": "LICENSE",
              "confidence": 781
            }
          ]
        }
      ]
.
.
.
.
}

我想将每个文本键值写入在lines数组内的word数组内。 我的代码是这样的:

with open('data3.txt', 'w', encoding='utf-8') as f:
        for each in azure_json['regions']:
            print(each['lines'][0]['words'][0]['text'])

但是它只给我行数组中每个FIRST word数组中的第一个文本键值,即此代码的输出如下:

KINGDOM
سياقة

因此,它仅给我两行中的第一个文本。 我想为每一行打印出每个单词数组内的所有文本键值。

请帮助。

1 个答案:

答案 0 :(得分:0)

遍历所有内容,而不是在第一项上建立索引。

for region in azure_json["regions"]:
    for line in region["lines"]:
        for word in line["words"]:
            print(word["text"])

输出将类似于:

KINGDOM
OF
SAUDI
ARABIA
MINISTRY
OF
INTERIOR
االل
سياقة
رخصة
DRIVING
LICENSE

如果您要写入文件而不是打印到标准输出,则只需使用write方法-您的示例代码出于某种原因会打开输出文件,但不会写入该文件。请注意,您必须明确提供换行符("\n"):

with open('data3.txt', 'w', encoding='utf-8') as f:
    for region in azure_json["regions"]:
        for line in region["lines"]:
            for word in line["words"]:
                f.write(word["text"] + "\n")