如何在Python中解析多个文件并提取重要信息?

时间:2019-03-22 14:52:33

标签: python parsing yaml

我正在尝试编写一个程序,该程序从数百个YAML文件中重复读取数据,并将文件中包含的某些信息存储在某种表中。该程序实质上将解析给定目录中的每个YAML文件并提取相关信息,直到成功解析了每个文件。

其中一个YAML文件的内容示例:

%YAML:1.0
camera_rotation_wrt_base: !!opencv-matrix
  cols: 3
  data: [-0.0159428846, 0.0246045925, 0.999570131, -0.999774337, -0.0144301597, -0.0155909406,
    0.0140403481, -0.999593139, 0.024829099]
  dt: f
  rows: 3
camera_translation_wrt_base: [0.4445618987083435, 0.11700689047574997, 1.5018157958984375]
object_rotation_wrt_base: !!opencv-matrix
  cols: 3
  data: [-0.74130547, -0.0615471229, 0.668339849, 0.669196069, -0.144052029, 0.728989482,
    0.0514085107, 0.987654269, 0.147973642]
  dt: f
  rows: 3
object_rotation_wrt_camera: !!opencv-matrix
  cols: 3
  data: [-0.6565323818253673, 0.1588616842697038, -0.737379262582055, -0.07928983462501557,
    -0.9866892288165471, -0.14197683014545748, -0.7501189077886223, -0.03474571781315458,
    0.6603895951165363]
  dt: f
  rows: 3
object_translation_wrt_base: [1.1534364223480225, 0.05951927974820137, 1.3502429723739624]
object_translation_wrt_camera: [0.04407151401699165, 0.16979082390232392, 0.705698973194305]
template_id: 1965

我希望能够将来自object_rotation_wrt_camera密钥的数据密钥以及object_translation_wrt_camera密钥存储在CSV文件中,如下所示:

observation,rotation,translation
1,[-0.53434, 0.023343, .....],[0.54545,0.34344,....]                
2,[-0.52234, 0.3433, .....],[0.65645,0.8787344,....] 
3,[0.32234, 0.6453, .....],[0.622645,0.1787344,....]

在上表中,观察值编号与yaml文件有关,因此,对于每个文件,在CSV文件中都有一个用于旋转变量和平移变量的观察值。 (注意:表中使用的句点仅表示旋转和平移变量持续很长的时间)。

最后,我想创建一个最终的CSV文件,该文件与上面的文件相似,但是将所有旋转值和翻译值分开(这意味着不是一列用于翻译而一列用于旋转,而是3个表示与先前CSV文件列表中的3个值有关的翻译,以及与先前CSV文件列表中的9个值有关的9列),例如:

observation,tran1,tran2,tran3,rot1,rot2,rot3,rot4,rot5,rot6,rot7,rot8,rot9
1,-0.545434,4.54545,0.343434,.............................................
2,-0.4543,3.3434,0.3534,..................................................

1 个答案:

答案 0 :(得分:0)

我将您的示例存储在两个以.yaml结尾的不同文件中,然后运行:

import sys
from pathlib import Path
import csv
import ruamel.yaml

result = [['observation', 'rotation', 'translation']]
flatres = ["observation,tran1,tran2,tran3,rot1,rot2,rot3,rot4,rot5,rot6,rot7,rot8,rot9".split(',')]
yaml = ruamel.yaml.YAML()

for idx, file_name in enumerate(Path('.').glob('*.yaml')):
   txt = file_name.read_text()
   if txt.startswith('%YAML:1.0'):
      txt = txt.replace('%YAML:1.0', "", 1).lstrip()
   data = yaml.load(txt)
   result.append([
     idx+1,
     data['object_rotation_wrt_camera']['data'],
     data['object_translation_wrt_camera'],
   ])
   row = [idx+1]
   row.extend(data['object_translation_wrt_camera'])
   row.extend(data['object_rotation_wrt_camera']['data'])
   flatres.append(row)

writer = csv.writer(sys.stdout)
writer.writerows(result)
print('---------')
writer = csv.writer(sys.stdout)
writer.writerows(flatres)

给出:

observation,rotation,translation
1,"[-0.6565323818253673, 0.1588616842697038, -0.737379262582055, -0.07928983462501557, -0.9866892288165471, -0.14197683014545748, -0.7501189077886223, -0.03474571781315458, 0.6603895951165363]","[0.04407151401699165, 0.16979082390232392, 0.705698973194305]"
2,"[-0.6565323818253673, 0.1588616842697038, -0.737379262582055, -0.07928983462501557, -0.9866892288165471, -0.14197683014545748, -0.7501189077886223, -0.03474571781315458, 0.6603895951165363]","[0.04407151401699165, 0.16979082390232392, 0.705698973194305]"
---------
observation,tran1,tran2,tran3,rot1,rot2,rot3,rot4,rot5,rot6,rot7,rot8,rot9
1,0.04407151401699165,0.16979082390232392,0.705698973194305,-0.6565323818253673,0.1588616842697038,-0.737379262582055,-0.07928983462501557,-0.9866892288165471,-0.14197683014545748,-0.7501189077886223,-0.03474571781315458,0.6603895951165363
2,0.04407151401699165,0.16979082390232392,0.705698973194305,-0.6565323818253673,0.1588616842697038,-0.737379262582055,-0.07928983462501557,-0.9866892288165471,-0.14197683014545748,-0.7501189077886223,-0.03474571781315458,0.6603895951165363

因为您想要的序列中包含逗号 存储时,这些条目需要加引号,Python的CSV编写器执行 自动(否则,CSV的第二行和后续行) 将包含三个以上的元素。

2005年,YAML 1.1取代了YAML 1.0,我尚未付款 特别注意这些版本之间的区别,除了 指令。自2009年以来,YAML 1.2一直是YAML标准。 ruamel.yaml仅支持具有明确的YAML 1.2(或1.1)的文件 指令,这就是%YAML:1.0指令必须为 从这些文件中明确删除。

如果您有任何“旧式”八进制整数,则可能会遇到麻烦 在您的文件中,在某些其他情况下则不会在您的输入示例中显示。