将来自特定列的输入解析为单独的列的最有效方法是什么?

时间:2019-07-01 21:41:53

标签: c# excel csv parsing

我有一个带有特定列Message的CSV文件,其中包含以下输入,我想正确地将其分开。请注意,下面的代码段在Excel中看起来不是这样,我目前需要对其进行格式化

    ["CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy,  TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5"]

我想将其分离出来,使它看起来像这样(列的名称将在冒号之前,而其中的信息将在冒号之后)。

CorrelationID: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: 
/api/acmsxdsreader/readpolicyfrompolicyassignment
Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1
RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy,
TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5[![enter image description here]

我尝试使用文本到列,但在Excel中却无法正确显示

我想知道什么是最好的方法。我目前正在用C#编写一个程序,尝试对其进行正确解析,但是我所拥有的无法正常工作。

供参考,这是我的C#代码。但是我愿意采取任何方式。

static void Main(string[] args)
    {
        using (TextFieldParser parser = new TextFieldParser(@"C:\Users\t-maucal\Desktop\MachineLearningTestSets\CSVParse.csv"))
        {
            parser.TextFieldType = FieldType.Delimited;
            parser.SetDelimiters(" ");
            while (!parser.EndOfData)
            {
                //Process row
                string[] fields = parser.ReadFields();
                foreach (string field in fields)
                {
                    Console.WriteLine(field);
                }
            }
        }
    }

Message Column RAW FORMAT EXPECTED RESULT

3 个答案:

答案 0 :(得分:1)

使用公式,@ cybernetic.nomad大部分使用该方法。为了从数据中删除标题,您可以尝试以下操作:

  1. 将每个列的类别(CorrelationId :,请求API :)放入单元格B1:G1

  2. B2中,使用以下公式:

    =RIGHT(LEFT($A2,FIND(C$1,$A2)-1),LEN(LEFT($A2,FIND(C$1,$A2)-1))-(LEN(B1)+2))
    
  3. C2中,使用以下公式:

    =RIGHT(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)),LEN(MID($A2,FIND(C$1,$A2),FIND(D$1,$A2,FIND(C$1,$A2))-FIND(C$1,$A2)))-(LEN(C1)+1))
    
  4. D2中,使用以下公式:

    =RIGHT(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)),LEN(MID($A2,FIND(D$1,$A2),FIND(E$1,$A2,FIND(D$1,$A2))-FIND(D$1,$A2)))-(LEN(D1)+2))
    
  5. E2中,使用以下公式:

    =RIGHT(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))-1),LEN(MID($A2,FIND(E$1,$A2),FIND(F$1,$A2,FIND(E$1,$A2))-FIND(E$1,$A2,FIND(D$1,$A2))))-(LEN(E1)+2))
    
  6. F2中,使用以下公式:

    =RIGHT(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)),LEN(MID($A2,FIND(F$1,$A2),FIND(G$1,$A2,FIND(F$1,$A2))-FIND(F$1,$A2)))-(LEN(F1)+2))
    
  7. G2中,使用以下公式:

    =RIGHT($A2,LEN($A2)-FIND(G$1,$A2)-LEN(G1))
    

    enter image description here

答案 1 :(得分:1)

您可以使用用VBA编写的宏。

我创建了一个类,并使用您不同列标题的属性将其重命名为cData

然后,我使用正则表达式从您提供的数据中分离出不同的属性,将其收集到Dictionary中,然后按指定顺序将结果输出到单独的工作表中。

我假设您的命名列标题是您要查找的信息,并且像您的文本示例一样,每个类别都只有一个实例要关注。

我还假设您的数据以B1开头。

仔细阅读宏中的注释。

请务必按照常规模块代码中的指示设置参考。

课程模块

'Rename this Module **cData**
Option Explicit
Private pCorrelationID As String
Private pRequestForApi As String
Private pCaller As String
Private pRequestedSchemas As String
Private pTenantID As String

Public Property Get CorrelationID() As String
    CorrelationID = pCorrelationID
End Property
Public Property Let CorrelationID(Value As String)
    pCorrelationID = Value
End Property

Public Property Get RequestForApi() As String
    RequestForApi = pRequestForApi
End Property
Public Property Let RequestForApi(Value As String)
    pRequestForApi = Value
End Property

Public Property Get Caller() As String
    Caller = pCaller
End Property
Public Property Let Caller(Value As String)
    pCaller = Value
End Property

Public Property Get RequestedSchemas() As String
    RequestedSchemas = pRequestedSchemas
End Property
Public Property Let RequestedSchemas(Value As String)
    pRequestedSchemas = Value
End Property

Public Property Get TenantID() As String
    TenantID = pTenantID
End Property
Public Property Let TenantID(Value As String)
    pTenantID = Value
End Property

常规模块

'Set Reference to Microsoft Scripting Runtime
'Set Reference to Microsoft VBScript Regular Expressions 5.5
Option Explicit
Sub ttcSpecial()
    Dim wsSrc As Worksheet, wsRes As Worksheet
    Dim vSrc As Variant, vRes As Variant
    Dim rRes As Range
    Dim dD As Dictionary
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim cD As cData
    Dim myKey, I As Long, sTemp As String

Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
    Set rRes = wsRes.Cells(1, 1)

With wsSrc
    vSrc = .Range(.Cells(1, 2), .Cells(.Rows.Count, 2).End(xlUp))
    If Not IsArray(vSrc) Then
        sTemp = vSrc
        ReDim vSrc(1 To 1, 1 To 1)
        vSrc(1, 1) = sTemp
    End If
End With

Set RE = New RegExp
With RE
    .Global = True
    .IgnoreCase = True
    .MultiLine = False
    .Pattern = "((?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID)):([\s\S]+?)(?=(?:CorrelationID|Request For API|Caller|RequestedSchemas|TenantID|$))"
End With


Set dD = New Dictionary
    dD.CompareMode = TextCompare

For I = 1 To UBound(vSrc, 1)
    Set cD = New cData
    With cD
    If RE.Test(vSrc(I, 1)) = True Then
        myKey = I
        Set MC = RE.Execute(vSrc(I, 1))
        For Each M In MC
            Select Case M.SubMatches(0)
                Case "CorrelationID"
                    .CorrelationID = M.SubMatches(1)
                Case "Request for API"
                    .RequestForApi = M.SubMatches(1)
                Case "Caller"
                    .Caller = M.SubMatches(1)
                Case "RequestedSchemas"
                    .RequestedSchemas = M.SubMatches(1)
                Case "TenantID"
                    .TenantID = M.SubMatches(1)
            End Select
        Next M

        dD.Add Key:=myKey, Item:=cD
    End If
    End With
Next I

ReDim vRes(0 To dD.Count, 1 To 5)

'Headers
    vRes(0, 1) = "Correlation ID"
    vRes(0, 2) = "Request for API"
    vRes(0, 3) = "Caller"
    vRes(0, 4) = "Requested Schemas"
    vRes(0, 5) = "Tenant ID"

I = 0
For Each myKey In dD.Keys
    I = I + 1
    With dD(myKey)
        vRes(I, 1) = .CorrelationID
        vRes(I, 2) = .RequestForApi
        vRes(I, 3) = .Caller
        vRes(I, 4) = .RequestedSchemas
        vRes(I, 5) = .TenantID
    End With
Next myKey

Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
End With

End Sub

原始问题中文本示例的结果

enter image description here

正则表达式 简化的解释

  • 匹配任何列标题
  • 匹配冒号之后开始的所有内容
    • 最多但不包括另一个列标题或字符串的结尾

答案 2 :(得分:0)

这是很多容易出错的工作。只需使用Josh Close的CSVHelper。这是一个优秀的程序包,快速且易于使用。