通过excel vba从复杂网站导入数据

时间:2018-05-17 19:42:01

标签: html xml excel-vba vba excel

我还是初学者,但我可以阅读简单的html结构。

然而,在网站https://stockrow.com/AAPL/financials/income/annual上,我尝试使用 xmlhttprequest 将数据提取到excel中,但源数据缺少包含所有关键值的重要表格。 当我检查网站时,我可以看到整个HTML结构。

这是我得到的源数据:

<!DOCTYPE html>
<html lang="en">
  <head>
    <link rel="apple-touch-icon-precomposed" sizes="57x57" 
href="/favicons/apple-touch-icon-57x57.png" />
<link rel="apple-touch-icon-precomposed" sizes="114x114" 
href="/favicons/apple-touch-icon-114x114.png" />
<link rel="apple-touch-icon-precomposed" sizes="72x72" 
href="/favicons/apple-touch-icon-72x72.png" />
<link rel="apple-touch-icon-precomposed" sizes="144x144" 
href="/favicons/apple-touch-icon-144x144.png" />
<link rel="apple-touch-icon-precomposed" sizes="60x60" 
href="/favicons/apple-touch-icon-60x60.png" />
<link rel="apple-touch-icon-precomposed" sizes="120x120" 
href="/favicons/apple-touch-icon-120x120.png" />
<link rel="apple-touch-icon-precomposed" sizes="76x76" 
href="/favicons/apple-touch-icon-76x76.png" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" 
href="/favicons/apple-touch-icon-152x152.png" />
<link rel="icon" type="image/png" href="/favicons/favicon-196x196.png" 
sizes="196x196" />
<link rel="icon" type="image/png" href="/favicons/favicon-96x96.png" 
sizes="96x96" />
<link rel="icon" type="image/png" href="/favicons/favicon-32x32.png" 
sizes="32x32" />
<link rel="icon" type="image/png" href="/favicons/favicon-16x16.png" 
sizes="16x16" />
<link rel="icon" type="image/png" href="/favicons/favicon-128.png" 
sizes="128x128" />
<meta name="application-name" content="stockrow.com"/>
<meta name="msapplication-TileColor" content="#FFFFFF" />
<meta name="msapplication-TileImage" content="/favicons/mstile-144x144.png" 
/>

<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />

<link href="https://code.cdn.mozilla.net/fonts/fira.css" rel="stylesheet" type="text/css" />

<script src="https://www.google.com/recaptcha/api.js"></script>

  <script src="https://cdn.ravenjs.com/3.15.0/raven.min.js"></script>
  <script>Raven.config('https://3ce523a8252c436f83c6fc423b340c0a@sentry.io/144901').install()</script>

<meta name="csrf-param" content="authenticity_token" />

  

    

<link rel="stylesheet" media="screen" href="/packs/stockrow-aa9c6f09f554179248530de2e33baa9b.css" />
<script src="/packs/stockrow-a35b20c51d525016f7c7.js"></script>
<script async id="_ck_381101" src="https://forms.convertkit.com/381101?v=7"></script>

我不知道如何解决这个问题,所以我想我会尝试堆栈溢出。

2 个答案:

答案 0 :(得分:0)

如果您只需要网站显示的数据,您实际上可以使用VBA打开IE实例并要求IE为您搜索数据。这有点像黑客,但它会完成这项工作。

基本上,请使用浏览器检查网站,并查看哪些元素包含您想要的数据。在VBA脚本中,您可以要求VBA收集元素中包含的数据。

答案 1 :(得分:0)

仔细查看HTML页面,您会发现可以下载xlsx。实际上,您只需复制与元素的href相关联的URL,然后将其传递给URLMon即可直接下载。

摘要:

 <a class="button hollow expanded" href="/api/companies/AAPL/financials.xlsx?dimension=MRY&amp;section=Income Statement" target="_blank">Export to Excel (.xlsx)</a>
    

图片:

Path

href是相对的,因此您需要将主机域放在最前面。


VBA:

Option Explicit

#If VBA7 And Win64 Then
    Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
    Alias "URLDownloadToFileA" ( _
    ByVal pCaller As LongPtr, _
    ByVal szURL As String, _
    ByVal szFileName As String, _
    ByVal dwReserved As LongPtr, _
    ByVal lpfnCB As LongPtr _
    ) As Long

#Else
    Private Declare Function URLDownloadToFile Lib "urlmon" _
                             Alias "URLDownloadToFileA" ( _
                             ByVal pCaller As Long, _
                             ByVal szURL As String, _
                             ByVal szFileName As String, _
                             ByVal dwReserved As Long, _
                             ByVal lpfnCB As Long _
                             ) As Long

#End If

Public Const BINDF_GETNEWESTVERSION As Long = &H10
Public Const folderName As String = "C:\Users\HarrisQ\Desktop\info.xlsx" '<=Change as required

Public Sub downloadPDF()
    Dim ret As Long
    ret = URLDownloadToFile(0, "https://stockrow.com/api/companies/AAPL/financials.xlsx?dimension=MRY&amp;section=Income Statement", folderName, BINDF_GETNEWESTVERSION, 0)

End Sub