如何使用VBA递归解析电子邮件中的数据?

时间:2018-04-26 04:05:27

标签: vba excel-vba outlook excel

因此,我每天都会收到包含相关信息的电子邮件。不幸的是,由于某种原因,数据是在电子邮件正文中发送的,而不是作为附件发送的。那好吧。我使用Excel来使用VBA来抓取Outlook。

Sub mytry()
Dim olapp As Object
Dim olmapi As Object
Dim olmail As Object
Dim olitem As Object
Dim lrow As Integer
Dim olattach As Object
Dim str As String
Dim TextWeNeedToParse as String

Const num As Integer = 6
Const path As String = "C:\HP\"
Const emailpath As String = "C:\Dell\"
Const olFolderInbox As Integer = 6

Set olp = CreateObject("outlook.application")
Set olmapi = olp.getnamespace("MAPI")
Set olmail = olmapi.getdefaultfolder(num)

If olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""").Count = 0 Then


    Else

        For Each olitem In olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""")

            TextWeNeedToParse = olitem.body

            'Recursive text parsing here

        Next olitem

End If

好的,所以这段代码片段应该将文本的整个主体变成一个字符串。现在我们可以传递字符串并操纵它。

我正在处理的文字样本:

WAL +300bp  QTY
(M) FCTR    SECURITY    CPN ASK 1mPSA   TYPE
0.77    1.15    458 0.04    GNR 2012-61 CA  2.00    99-16   217 SEQ
1.39    2.26    120 0.76    GNR 2005-13 AE  5.00    102-24  223 SUP
1.40    18.16   45  0.65    GNR 2015-157 NH 2.50    95-16   215 EXCH,+
1.50    21.56   25  0.94    GNR 2017-103 HD 3.00    98-08   375 PAC-2

因此我可以通过几种不同的方式来解决这个问题,但我并不完全了解所有这些内容。

1)我可以尝试计算存在多少回车,并进行循环。然后"计数"确定一切都在哪里的空间。不太清楚它的效果如何。

2)我可以在中间重新编写唯一的ID,如果我能弄清楚如何正则表达第n个实例(我被卡住的一个主要点),我也可以用它来正则表达式数字 - 例如,第一行将是由空格包围的直数/小数的1-5实例,以及number-number-dash-number-number的第一个实例。

我正在通过它的示例正则表达式代码:

Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
    Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
    Dim replaceNumber As Integer

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With
    With outputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\$(\d+)"
    End With
    With outReplaceRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    Set inputMatches = inputRegexObj.Execute(strInput)
    If inputMatches.Count = 0 Then
        regex = False
    Else
        Set replaceMatches = outputRegexObj.Execute(outputPattern)
        For Each replaceMatch In replaceMatches
            replaceNumber = replaceMatch.SubMatches(0)
            outReplaceRegexObj.Pattern = "\$" & replaceNumber

            If replaceNumber = 0 Then
                outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
            Else
                If replaceNumber > inputMatches(0).SubMatches.Count Then
                    'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                    regex = CVErr(xlErrValue)
                    Exit Function
                Else
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                End If
            End If
        Next
        regex = outputPattern
    End If
End Function

3)我可以尝试上面的一些方法,但使用递归。我的递归相当薄弱。

因此,一旦我提取了文本字符串,我想我需要像:

Sub QuickExample(Dim Cusip as String, Dim PriceStr as variant, Dim SpreadStr as variant) 
Dim ws as WorkSheet
Set ws = thisworkbook.sheets("Results")
LastRow = ws.Cells(sht.Rows.Count, "A").End(xlUp).Row

ws.cells(Lastrow,1).value2 = Cusip
ws.cells(Lastrow,2).value2 = PriceStr
ws.cells(Lastrow,3).value2 = SpreadStr

End Sub

最后:

Sub ParsingDate(EmailText as String)
Dim CarriageReturns As Long
CarriageReturns  = Len(EmailText) - Len(Replace(EmailText, Chr(10), ""))


For i = 1 to CarriageReturns
 'Parse out the data for the ith row, return it to the function above

Next i



End Sub

我正在努力解析的实际解析行为 - 我如何正确得到第n个结果,而且只得到第n个结果?即使添加了一些额外的空格或线条,我如何确保它仍能继续工作?有没有办法只使用正则表达式,"看"在给定表达式的第n个发现周围?没有很多递归就可以做到这一点吗?

谢谢

1 个答案:

答案 0 :(得分:1)

WAL +300bp  QTY
(M) FCTR    SECURITY    CPN ASK 1mPSA   TYPE
0.77    1.15    458 0.04    GNR 2012-61 CA  2.00    99-16   217 SEQ
1.39    2.26    120 0.76    GNR 2005-13 AE  5.00    102-24  223 SUP
1.40    18.16   45  0.65    GNR 2015-157 NH 2.50    95-16   215 EXCH,+
1.50    21.56   25  0.94    GNR 2017-103 HD 3.00    98-08   375 PAC-2

This seems like a pretty well formatted table. Perhaps pop each line into an array using Split() and then each field into an array, again using split():

Sub dump()

    arrLine = Split(TextWeNeedToParse, Chr(10))

    For Each Line In arrLine
        For Each field In Split(Line, " ")
            Debug.Print field
        Next
    Next

End Sub

That's super short and runs quick. You are just an if statement and counter (or regex test) away from getting the exact items you want.

Testing/counting may be easier if you remove multiple spaces so the split() puts each element in it's proper place. You could employee a loop to remove multiple spaces before running this:

Fully implemented it might be something like:

<your code to get the bod>

'remove multiple spaces from string for parsing
Do While InStr(1, TextWeNeedToParse, "  ")
    TextWeNeedToParse= Replace(TextWeNeedToParse, "  ", " ")
Loop

'Get each line into an array element
arrLine = Split(TextWeNeedToParse, Chr(10))

'Loop through the array
For Each Line In arrLine        

    'dump fields to an array
    arrFields = Split(Line, " ")

    'and spit out a particular element (your "unique id" is element 5)
    If UBound(arrFields) >= 5 Then Debug.Print "unique id:"; arrFields(5)
Next
相关问题