R使用重定向

时间:2017-10-17 13:00:05

标签: r curl https download rcurl

我想从网站下载一些excel文件。 网站需要用户名和密码,但我可以在运行代码之前手动登录。登录后,如果我手动将网址复制到我的浏览器(chrome),它会为我下载文件。但是当我在R中执行它时,最接近它给我一个文本文件,看起来像我需要下载文件的同一网站的HTML代码。另请注意URL结构。中间有一个“occ4.xlsx”,我认为是文件。

我还可以解释url中的其他参数:

  • country = JP(国家 - 日本)
  • jloc = state-1808(状态ID - 它将随状态而变化)
  • ...
  • ...
  • 时间框架等

以下是我的尝试:

迭代1(内置方法):

url <- "https://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
url_ns <- "http://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
destfile <- "test"
download.file(url, destfile,method="auto")
download.file(url, destfile,method="wininet")
download.file(url, destfile,method="auto", mode="wb")
download.file(url, destfile,method="wininet", mode="wb")
download.file(url_ns, destfile,method="auto")
download.file(url_ns, destfile,method="wininet")
download.file(url_ns, destfile,method="auto", mode="wb")
download.file(url_ns, destfile,method="wininet", mode="wb")
#all of above download the webpage and not the file

迭代2(使用RCurl):

# install.packages("RCurl")
library(RCurl)
library(readxl)
x <- getURL(url)
y <- getURL(url, ssl.verifypeer = FALSE)
z <- getURL(url, ssl.verifypeer = FALSE, ssl.verifyhost=FALSE)
identical(x,y) #TRUE
identical(y,z) #TRUE
x
[1] "<html><body>You are being <a href=\"http://www....2-17\">redirected</a>.</body></html>"
# **Note the text about redirect**
out <- readxl::read_xlsx(textConnection(x)) # I know it won't work
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim,  : 
                    Expecting a single string value: [type=integer; extent=1].
w = substr(x,36,nchar(x)-31) #removing redirect text
identical(w,url) # FALSE
out <- readxl::read_xlsx(textConnection(w))
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim,  : 
                    Expecting a single string value: [type=integer; extent=1].
download.file(w, destfile,method="auto")
#Downloads the webpage again
download.file(url_ns,destfile,method="libcurl")
#Downloads the webpage again

我也试过下载包但结果相同! 我无法在这个问题上分享用户名和密码,但如果您正在尝试解决这个问题,请通过评论/ pm告诉我,我将与您分享!

0 个答案:

没有答案