Question

我正在尝试复制上一个答案中使用的方法Scraping html tables into R data frames using the XML package以用于我自己的工作，但无法获取要提取的数据。我使用的网站是： http://www.footballfanalytics.com/articles/football/euro_super_league_table.html

我只想提取每个球队名称及其当前评分的表格。我的代码如下：

library(XML)
theurl <-  "http://www.footballfanalytics.com/articles/football/euro_super_league_table.html"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
tables[[which.max(n.rows)]]

这会产生错误消息

Error in tables[[which.max(n.rows)]] : 
attempt to select less than one element

有人可以提出解决方案吗？这个特定的网站有什么东西导致这不起作用吗？或者，我可以尝试更好的替代方法吗？感谢

Answer 1

好像数据是通过javascript加载的。尝试：

library(XML)
theurl <- "http://www.footballfanalytics.com/xml/esl/esl.xml"
doc <- xmlParse(theurl)
cbind(team = xpathSApply(doc, "/StatsData/Teams/Team/Name", xmlValue),
      points = xpathSApply(doc, "/StatsData/Teams/Team/Points", xmlValue))

R - 使用XML包从网站中提取表格

1 个答案: