Question

我想通过列表提取＆＃34;匹配匹配＆＃34;来自

的表格

http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match

我是R的新手，所以不太了解从网页中提取数据。我用这段代码来提取表格。

fileUrl<- "http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match"
#load
sanga <-htmlTreeParse(fileUrl, useInternal=T)
sanga.data <-xpathSApply(sanga,"//tr[@class='data1']",xmlValue)

但是我最终得到一个列矩阵，其中原始表中的每一列都表示为一行。我读了这个帖子中的信息，但仍然无法弄清楚如何以表格格式获取数据。 Scraping html tables into R data frames using the XML package

Answer 1

您需要对列名称进行一些操作（并删除NA'spacer'列），但使用正确的XPath可以直接找到所需的表格：

library(rvest)
library(magrittr)

pg <- html("http://stats.espncricinfo.com/ci/engine/player/50710.html?class=2;template=results;type=batting;view=match")

pg %>% 
  html_nodes(xpath="//tr[@class='data1']/../..") %>%  # get to a reasonable set of tables (there are many)
  extract2(2) %>%                                     # we want the second one
  html_table(header=TRUE, trim=TRUE) -> data          # there's a header and pls trim the blanks

str(data)
## data.frame':  397 obs. of  11 variables:
##  $ Bat1      : chr  "35" "85" "36*" "DNB" ...
##  $ Runs      : chr  "35" "85" "36" "-" ...
##  $ BF        : chr  "55" "116" "47" "-" ...
##  $ SR        : chr  "63.63" "73.27" "76.59" "-" ...
##  $ 4s        : chr  "4" "11" "3" "-" ...
##  $ 6s        : chr  "0" "0" "0" "-" ...
##  $           : logi  NA NA NA NA NA NA ...
##  $ Opposition: chr  "v Pakistan" "v South Africa" "v Pakistan" "v South Africa" ...
##  $ Ground    : chr  "Galle" "Galle" "Colombo (RPS)" "Colombo (SSC)" ...
##  $ Start Date: chr  "5 Jul 2000" "6 Jul 2000" "9 Jul 2000" "11 Jul 2000" ...
##  $           : chr  "ODI # 1603" "ODI # 1604" "ODI # 1608" "ODI # 1610" ...

R从网页中提取表格

1 个答案: