Question

我有一个存储在JSON对象中的OCR数据，看起来像这样：

{
  "language": "ar",
  "textAngle": 0,
  "orientation": "Right",
  "regions": [
    {
      "boundingBox": "191,26,283,265",
      "lines": [
        {
          "boundingBox": "191,26,283,22",
          "words": [
            {
              "boundingBox": "191,28,95,20",
              "text": "KINGDOM",
              "confidence": 861
            },
            {
              "boundingBox": "292,27,26,18",
              "text": "OF",
              "confidence": 826
            },
            {
              "boundingBox": "323,26,64,19",
              "text": "SAUDI",
              "confidence": 840
            },
            {
              "boundingBox": "393,26,81,18",
              "text": "ARABIA",
              "confidence": 765
            }
          ]
        },
        {
          "boundingBox": "215,58,237,20",
          "words": [
            {
              "boundingBox": "215,59,98,19",
              "text": "MINISTRY",
              "confidence": 812
            },
            {
              "boundingBox": "318,58,28,18",
              "text": "OF",
              "confidence": 996
            },
            {
              "boundingBox": "353,58,99,18",
              "text": "INTERIOR",
              "confidence": 713
            }
          ]
        },
        {
          "boundingBox": "243,258,137,33",
          "words": [
            {
              "boundingBox": "243,258,137,33",
              "text": "االل",
              "confidence": 999
            }
          ]
        }
      ]
    },
    {
      "boundingBox": "523,29,230,57",
      "lines": [
        {
          "boundingBox": "545,29,186,30",
          "words": [
            {
              "boundingBox": "545,29,81,30",
              "text": "سياقة",
              "confidence": 999
            },
            {
              "boundingBox": "632,30,99,29",
              "text": "رخصة",
              "confidence": 999
            }
          ]
        },
        {
          "boundingBox": "523,70,230,16",
          "words": [
            {
              "boundingBox": "523,70,107,16",
              "text": "DRIVING",
              "confidence": 679
            },
            {
              "boundingBox": "642,70,111,16",
              "text": "LICENSE",
              "confidence": 781
            }
          ]
        }
      ]
.
.
.
.
}

我想将每个文本键值写入在lines数组内的word数组内。我的代码是这样的：

with open('data3.txt', 'w', encoding='utf-8') as f:
        for each in azure_json['regions']:
            print(each['lines'][0]['words'][0]['text'])

但是它只给我行数组中每个FIRST word数组中的第一个文本键值，即此代码的输出如下：

KINGDOM
سياقة

因此，它仅给我两行中的第一个文本。我想为每一行打印出每个单词数组内的所有文本键值。

请帮助。

Answer 1

遍历所有内容，而不是在第一项上建立索引。

for region in azure_json["regions"]:
    for line in region["lines"]:
        for word in line["words"]:
            print(word["text"])

输出将类似于：

KINGDOM
OF
SAUDI
ARABIA
MINISTRY
OF
INTERIOR
االل
سياقة
رخصة
DRIVING
LICENSE

如果您要写入文件而不是打印到标准输出，则只需使用write方法-您的示例代码出于某种原因会打开输出文件，但不会写入该文件。请注意，您必须明确提供换行符（"\n"）：

with open('data3.txt', 'w', encoding='utf-8') as f:
    for region in azure_json["regions"]:
        for line in region["lines"]:
            for word in line["words"]:
                f.write(word["text"] + "\n")

通过多个数组Python3遍历Json对象

1 个答案: