我有一个存储在JSON对象中的OCR数据,看起来像这样:
{
"language": "ar",
"textAngle": 0,
"orientation": "Right",
"regions": [
{
"boundingBox": "191,26,283,265",
"lines": [
{
"boundingBox": "191,26,283,22",
"words": [
{
"boundingBox": "191,28,95,20",
"text": "KINGDOM",
"confidence": 861
},
{
"boundingBox": "292,27,26,18",
"text": "OF",
"confidence": 826
},
{
"boundingBox": "323,26,64,19",
"text": "SAUDI",
"confidence": 840
},
{
"boundingBox": "393,26,81,18",
"text": "ARABIA",
"confidence": 765
}
]
},
{
"boundingBox": "215,58,237,20",
"words": [
{
"boundingBox": "215,59,98,19",
"text": "MINISTRY",
"confidence": 812
},
{
"boundingBox": "318,58,28,18",
"text": "OF",
"confidence": 996
},
{
"boundingBox": "353,58,99,18",
"text": "INTERIOR",
"confidence": 713
}
]
},
{
"boundingBox": "243,258,137,33",
"words": [
{
"boundingBox": "243,258,137,33",
"text": "االل",
"confidence": 999
}
]
}
]
},
{
"boundingBox": "523,29,230,57",
"lines": [
{
"boundingBox": "545,29,186,30",
"words": [
{
"boundingBox": "545,29,81,30",
"text": "سياقة",
"confidence": 999
},
{
"boundingBox": "632,30,99,29",
"text": "رخصة",
"confidence": 999
}
]
},
{
"boundingBox": "523,70,230,16",
"words": [
{
"boundingBox": "523,70,107,16",
"text": "DRIVING",
"confidence": 679
},
{
"boundingBox": "642,70,111,16",
"text": "LICENSE",
"confidence": 781
}
]
}
]
.
.
.
.
}
我想将每个文本键值写入在lines数组内的word数组内。 我的代码是这样的:
with open('data3.txt', 'w', encoding='utf-8') as f:
for each in azure_json['regions']:
print(each['lines'][0]['words'][0]['text'])
但是它只给我行数组中每个FIRST word数组中的第一个文本键值,即此代码的输出如下:
KINGDOM
سياقة
因此,它仅给我两行中的第一个文本。 我想为每一行打印出每个单词数组内的所有文本键值。
请帮助。
答案 0 :(得分:0)
遍历所有内容,而不是在第一项上建立索引。
for region in azure_json["regions"]:
for line in region["lines"]:
for word in line["words"]:
print(word["text"])
输出将类似于:
KINGDOM
OF
SAUDI
ARABIA
MINISTRY
OF
INTERIOR
االل
سياقة
رخصة
DRIVING
LICENSE
如果您要写入文件而不是打印到标准输出,则只需使用write
方法-您的示例代码出于某种原因会打开输出文件,但不会写入该文件。请注意,您必须明确提供换行符("\n"
):
with open('data3.txt', 'w', encoding='utf-8') as f:
for region in azure_json["regions"]:
for line in region["lines"]:
for word in line["words"]:
f.write(word["text"] + "\n")