python - xml到csv的转换

时间:2017-08-17 13:27:41

标签: python xml csv

ab= [['name Belgian Waffles', 'price $5.95', 'description Two of our famous Belgian Waffles ', 'calories 650'] ]

我想使用python

将此列表解析为表格式的CSV文件
Ex. :
name              price        description                           Calories
Belgian Waffles   $5.95        Two of our famous Belgian Waffles     650

注意:列表大小可能会有所不同。值可以变化。不应该有硬编码。

有问题的xml是

<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
  <food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>Two of our famous Belgian Waffles with plenty of real maple 
     syrup</description>
    <calories>650</calories>
  </food>
  <food>
    <name>Berry-Berry Belgian Waffles</name>
    <price>$8.95</price>
    <description>Light Belgian waffles covered with an assortment of fresh 
      berries and whipped cream</description>
    <calories>900</calories>
  </food>

我尝试使用以下python脚本

将食物作为根首先将其提取到列表中
def innerHtml(root):
     text = '' 
     nodes = [ root ]
     while not nodes==[]:
        node = nodes.pop()
        if node.nodeType==xml.dom.Node.TEXT_NODE:
            text += node.wholeText
        else:            
            nodes.extend(node.childNodes)          
     return text
innerlist=[]
outerlist=[]
string2=[]

# To get tag value
for statusNode in xmlFile.getElementsByTagName(xmlNode):
    for childNode in statusNode.childNodes:
        if childNode.nodeType==xml.dom.Node.ELEMENT_NODE:
            if innerHtml(childNode).strip() != '':
                string2.append(childNode.nodeName)
                innerlist.append(childNode.nodeName+" 
                "+innerHtml(childNode).strip())
    outerlist.append(innerlist)
    innerlist=[]    
    print (outerlist)

我得到以下列表

outerlist = [['name Belgian Waffles', 'price $5.95', 'description Two of our famous Belgian Waffles ', 'calories 650'] , ['name Berry-Berry Belgian Waffles','price $8.95','description Light Belgian waffles covered with an assortment  ','calories 900']]

我希望使用格式

的python将其写入CSV
name           price       description       calories
<name given>   <price>     <description>     <calories>

2 个答案:

答案 0 :(得分:1)

ab= [['name Belgian Waffles', 'price $5.95', 'description Two of our famous Belgian Waffles ', 'calories 650']]
(column_names, row_values) = (list() for i in range(2))

for newlist in range(0,len(ab)):
    for i in range(0,len(ab[newlist])):

        column = ab[newlist][i].split()[0]

        if column not in column_names:
            column_names.append(ab[newlist][i].split()[0])

        row_values.append(re.sub(column_names[i], '', ab[newlist][i]).strip())

df = pd.DataFrame(data=row_values).T
df.columns = column_names
file_name = "yourfilenameandpath"
df.to_csv(file_name, sep='\t', encoding='utf-8')

编辑:

import pandas as pd
from lxml import etree

xmlfile = archive.open("xmlfile_name.xml")
xmldoc = etree.parse(xmlfile)
root = xmldoc.getroot()

foods  = root.find("breakfast_menu").findall("food")

(name, price, description, calories) = (list() for i in range(4))

for food in foods:
    name.append(food.find("name").text)
    price.append(food.find("price").text)
    description.append(food.find("description").text)
    calories.append(food.find("calories").text)

df= pd.DataFrame({"name": name,
                  "price": price,
                  "description": description,
                  "calories": calories)
df.to_csv(file_name, sep='\t', encoding='utf-8')

答案 1 :(得分:0)

$processUser = posix_getpwuid(posix_geteuid());
echo $processUser['name'];