Question

我在word_list下面有一个列表：

[
 [{'bottom': Decimal('58.650'),
   'text': 'Hi there!',
   'top': Decimal('40.359'),
   'x0': Decimal('21.600'),
   'x1': Decimal('65.644')}
 ],
 [{'bottom': Decimal('74.101'),
   'text': 'Your email',
   'top': Decimal('37.519'),
   'x0': Decimal('223.560'),
   'x1': Decimal('300')},
  {'bottom': Decimal('77.280'),
   'text': 'my@domain.com',
   'top': Decimal('62.506'),
   'x0': Decimal('21.600'),
   'x1': Decimal('140.775')}]
]

如您所见，上面包含一个列表，看起来像一个嵌套列表。上面的text可以表示为：

[0] = 'Hi there!'
[1] = 'Your Email'
[1] = 'my@domain.com'

这是我的代码，它生成row_list：

word_list = sorted(first_page.extract_words(),
                    key=lambda x: x['bottom'])
threshold = float('10')
current_row = [word_list[0], ]
row_list = [current_row, ]

for word in word_list[1:]:
    if abs(current_row[-1]['bottom'] - word['bottom']) <= threshold:
        # distance is small, use same row
        current_row.append(word)
    else:
        # distance is big, create new row
        current_row = [word, ]
        row_list.append(current_row)

我想做的是将上述输出映射为类似的内容：

new = {
       1: {
          1: {'text': 'Hi there!', 'x0': Decimal('21.600')}
       },
       2: {
          1: {'text':'Your email', 'x0': Decimal('223.560')},
          2: {'text': 'my@domain.com', 'x0': Decimal('21.600')}
       }
      }

我已经尝试过各种方法，但无法弄清楚-因为我的原始word_list是列表，并且我试图将其显示为字典...

Answer 1

对于具有可靠输入的简洁代码，可以使用简短的递归函数。这将适用于多层嵌套（如果需要）：

def nest(l):
    if not isinstance(l, list):
        return {'text': l['text'], 'x0': l['x0']}
    return {i+1:nest(v) for i,v in enumerate(l)}

使用您的输入，它将返回：

> pp.pprint(nest(l))

> { 1: {1: {'text': 'Hi there!', 'x0': Decimal('21.600')}},
    2: {1: {'text': 'Your email', 'x0': Decimal('223.560')},
        2: {'text': 'my@domain.com', 'x0': Decimal('21.600')}
    }
  }

Answer 2

可以单行，但是很讨厌：

result = {}
for index in range(len(l)):
    append = {}
    for index2 in range(len(l[index])):
        append[index2 + 1] = {key: val for key, val in l[index][index2].items() if key in ('x0', 'text')}
    result[index + 1] = append

#result = {index + 1: {index2:  for index in range(len(l))}

import json
print(json.dumps(result, indent=2))

输出：

{
  "1": {
    "1": {
      "text": "Hi there!",
      "x0": "21.600"
    }
  },
  "2": {
    "1": {
      "text": "Your email",
      "x0": "223.560"
    },
    "2": {
      "text": "my@domain.com",
      "x0": "21.600"
    }
  }
}

请注意，它会将键打印为字符串，但实际上它们是int。我用来很好打印的json.dumps(...)将它们转换为字符串。

单线：

result = {index + 1: {index2 + 1: {key: val for key, val in l[index][index2].items() if key in ('x0', 'text')} for index2 in range(len(l[index]))} for index in range(len(l))}

使用自定义映射从列表创建嵌套字典

2 个答案: