如何使用VBA将在线PDF文件的内容读入字符串变量?

时间:2016-03-17 18:02:33

标签: vba excel-vba internet-explorer automation excel

我想知道是否有人曾经处理过此事。我有一个电子表格,其中包含数千个pdf文件的链接。我想将每个pdf的内容加载到字符串变量中并运行一些RegEx来提取有用的数据。我有如下所示的函数,它将pdf文件的内容加载到字符串中,但此函数仅适用于本地文件。但是在我的情况下,我使用IE.Navigate2 "https://www.example.com/mypdf.pdf"打开pdf文件,这将在浏览器中打开pdf,如何将该文件的内容加载到字符串中。极端的解决方案是下载文件并使用下面的函数打开它,然后将其删除。请让我知道你的想法。请注意,只有安装了Acrobat(不是阅读器),以下功能才有效,您还需要将VBA项目中的引用添加到Adobe Acrobat类型库

Public Function ReadAcrobatDocument(strFileName As String) As String
    Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
    Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
    Dim PageNumber, PageContent, Content, i, j
    Set AcroApp = CreateObject("AcroExch.App")
    Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
    If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
    ' The following While-Wend loop shouldn't be necessary but timing issues may occur.
    While AcroAVDoc Is Nothing
      Set AcroAVDoc = AcroApp.GetActiveDoc
    Wend
    Set AcroPDDoc = AcroAVDoc.GetPDDoc
    For i = 0 To AcroPDDoc.GetNumPages - 1
      Set PageNumber = AcroPDDoc.AcquirePage(i)
      Set PageContent = CreateObject("AcroExch.HiliteList")
      If PageContent.Add(0, 9000) <> True Then Exit Function
      Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
      ' The next line is needed to avoid errors with protected PDFs that can't be read
      On Error Resume Next
      For j = 0 To AcroTextSelect.GetNumText - 1
        Content = Content & AcroTextSelect.GetText(j)
      Next j
    Next i
    ReadAcrobatDocument = Content
    AcroAVDoc.Close True
    AcroApp.Exit
    Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function

0 个答案:

没有答案