Question

我成功地将示例1 xml作为R中的数据框对象但是在示例2中遇到了问题。是否有人建议使用R代码将数据从mtcars.xml转换为数据帧？

例1）

library(XML)
# Save the URL of the xml file in a variable

xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml"

# Use the xmlTreePares-function to parse xml file directly from the web

xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

例2）

    library(XML)
# Save the URL of the xml file in a variable

doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML"))


xmlfile <- xmlTreeParse(doc)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# have a look at the XML-code of the first subnodes:
print(xmltop)[1:2]


# To extract the XML-values from the document, use xmlSApply:

mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

Answer 1

尝试if KeyCount1 == 1: Start1 = int(round(time.time())) print(Start1) if KeyCount1 == 27: Stop1 = int(round(time.time())) print(Stop1) TotalT1 = Stop1 - Start1 print(TotalT1)：

xpathSApply

，并提供：

library(XML)

path <- system.file("exampleData", "mtcars.xml", package="XML")
doc <- xmlTreeParse(path, useInternal = TRUE)
root <- xmlRoot(doc)

read.table(text = xpathSApply(root, "//record", xmlValue), 
           col.names = xpathSApply(root, "//variable", xmlValue))

Answer 2

这是xml2的一种方式：

library(xml2)
library(purrr)
library(dplyr)

catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml"
doc <- read_xml(catalog_url)

# get all the "records"
plants <- xml_find_all(doc, ".//PLANT")

# get all the field names
kids <- xml_name(xml_children(plants[1]))

# make a data frame
# - iterate over each record
# - in each record grab each field
# - turn each row into a data frame
# - bind all the data frames together

map_df(plants, function(plant) {
  rbind_list(as.list(setNames(map_chr(kids, function(kid) {
    xml_text(xml_find_one(plant, sprintf(".//%s", kid)))
  }), kids)))
})

## Source: local data frame [36 x 6]
## 
##                 COMMON              BOTANICAL  ZONE        LIGHT PRICE AVAILABILITY
##                  (chr)                  (chr) (chr)        (chr) (chr)        (chr)
## 1            Bloodroot Sanguinaria canadensis     4 Mostly Shady $2.44       031599
## 2            Columbine   Aquilegia canadensis     3 Mostly Shady $9.37       030699
## 3       Marsh Marigold       Caltha palustris     4 Mostly Sunny $6.81       051799
## 4              Cowslip       Caltha palustris     4 Mostly Shady $9.90       030699
## 5  Dutchman's-Breeches    Dicentra cucullaria     3 Mostly Shady $6.44       012099
## 6         Ginger, Wild       Asarum canadense     3 Mostly Shady $9.03       041899
## 7             Hepatica     Hepatica americana     4 Mostly Shady $4.45       012699
## 8            Liverleaf     Hepatica americana     4 Mostly Shady $3.99       010299
## 9   Jack-In-The-Pulpit    Arisaema triphyllum     4 Mostly Shady $3.23       020199
## 10            Mayapple   Podophyllum peltatum     3 Mostly Shady $2.98       060599
## ..                 ...                    ...   ...          ...   ...          ...

通过查找所有可能的子名称（某些“记录”可能有更多或更少的子项），可以使其更加健壮，但这对于此示例就足够了。这样做（按名称获取每个元素的值）可确保它们以正确的顺序返回（元素的顺序不是保证）。

解析R中的xml - 返回数据框对象

2 个答案: