Question

如果您访问网站https://www.myfxbook.com/members/iseasa_public1/rush/2531687，然后单击该下拉框“导出”，然后选择CSV，则将转到https://www.myfxbook.com/statements/2531687/statement.csv，下载（从浏览器）将自动进行。事实是，您需要登录https://www.myfxbook.com才能接收信息；否则，下载的文件将包含文本“请登录Myfxbook.com以使用此功能”。

我尝试使用read.csv在R中获取csv文件，但仅收到“请登录”消息。我相信R必须模拟html会话（无论如何，我对此不确定），这样才能授予访问权限。然后我尝试了一些抓取工具先登录，但无济于事。

library(rvest)
login <- "https://www.myfxbook.com"
pgsession <- html_session(login)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, loginEmail = "*****", loginPassword = "*****") # loginEmail and loginPassword are the names of the html elements
submit_form(pgsession, filled_form)

url <- "https://www.myfxbook.com/statements/2531687/statement.csv"
page <- jump_to(pgsession, url) # page will contain 48 bytes of data (in the 'content' element), which is the size of that warning message, though I could not access this content.

通过上面的尝试，我发现page有一个名为cookies的元素，而该元素又包含JSESSIONID。根据我的研究，看来JSESSIONID是“证明”我已登录该网站的内容。但是，无法下载CSV。

然后我尝试：

library(RCurl)
h <- getCurlHandle(cookiefile = "")
ans <- getForm("https://www.myfxbook.com", loginEmail = "*****", loginPassword = "*****", curl = h) 
data <- getURL("https://www.myfxbook.com/statements/2531687/statement.csv", curl = h)
data <- getURLContent("https://www.myfxbook.com/statements/2531687/statement.csv", curl = h)

这些库似乎是为刮取html页面而构建的，不会处理其他格式的文件。

非常感谢您的帮助，因为我已经尝试了一段时间了。

谢谢。

从受密码保护的网站下载CSV

0 个答案: