从大型CSV转储文件中提取特定数据

时间:2019-06-12 15:32:19

标签: excel csv logic data-extraction

我有CSV格式的超过100万个条目的大数据,其中包含我公司的用户信息。我已使用Recsv编辑器从文件中删除多余的列。现在我有以下专栏文章

ID    NAME    EMAIL        SUB_STATUS   SUB_DATE      SMS_RECEIVED  MEMBER
1     John    abc@abc.com  true         01.01.2018    true          true
2     David   abc@abc.com  false        01.01.2018    true          true
3     Raza    abc@abc.com  true         01.01.2018    true          false
4     Syed    abc@abc.com  false        01.01.2018    false         false
5     Eidi    abc@abc.com  true         01.01.2018    false         false

我有超过100万条记录,但是我需要根据特定条件从中提取数据,例如,这里是示例逻辑

Extract all users which SUB_STATUS=true and SMS_RECEIVED=false and MEMBER=true OR
SUB_STATUS=false and SMS_RECEIVED=false and MEMBER=false

然后我可以根据上述示例条件在csv上获取输出。

如何存档?我是Windows用户,尝试使用PowerShell,Ressveditior。文件太大,无法在excel上打开。

2 个答案:

答案 0 :(得分:1)

将大文件导入到Excel中没有问题,只需要拆分数据即可。拆分后,您可以应用过滤器。

问题只是时间。我将此宏用于5000万行CSV文件,并且可以正常工作。只需花一些时间即可复制。分隔符为“,”,请检查您的分隔符。

Sub ReadCSVFiles()

Dim i, j, k, l, m As Long
Dim UserFileName As String
Dim strTextLine As String
Dim iFile As Integer: iFile = FreeFile
Dim Word() As String

UserFileName = Application.GetOpenFilename
Open UserFileName For Input As #iFile
i = 1
j = 1
Check = False

Do Until EOF(1)
    Line Input #1, strTextLine
    If i >= 1048576 Then
        i = 1
        j = j + 1
    Else
        Sheets(1).Cells(i, j) = strTextLine
        i = i + 1
    End If
Loop
Close #iFile

Worksheets.Add
Set ws1 = ThisWorkbook.Worksheets(1)
Set ws2 = ThisWorkbook.Worksheets(2)
ws1Col = ws1.UsedRange.SpecialCells(xlCellTypeLastCell).Column
ws1Row = ws1.UsedRange.SpecialCells(xlCellTypeLastCell).Row
k = 0
l = 0
Dim Items(1 To 16384) As Integer

For i = 1 To ws1Col
    For j = 1 To ws1Row
        Length = UBound(Split(ws1.Cells(j, i).Value2, ",", , vbBinaryCompare))
        'Change the separator here
        If Length > k Then
            k = Length
        End If
        For m = 0 To k
            Word() = Split(ws1.Cells(j, i).Value2, ",", , vbBinaryCompare)
            ws2.Cells(j, i + l + m).Value2 = Word(m)
        'Change the separator here
        Next
    Next
    If i = 1 Then
        Items(i) = k
    Else
        Items(i) = k + Items(i - 1)
    End If
    k = 0
    l = Items(i)
Next

End Sub

答案 1 :(得分:0)

您可以尝试 q 。该工具允许您直接在CSV上运行SQL查询,以提取数据的子集:https://harelba.github.io/q/

您还可以尝试使用Excel PowerPivot ,或MS Access!