解析文本文件并保存到.csv

时间:2016-12-19 14:50:54

标签: powershell csv parsing

我有一个文本(.txt)文件,如下所示:

Person    Person Name     Person   Approval     Supervisor Payroll Name    Application  Supplier Start Date  End Date Archived
Type                      Number   Status       Name                       Name


Agency    D'Cunha, Yionue 123456   NOT ENTERED  Power,                     Projects    CONTRACT
Contractor                                      Mehash                                 SUPPLIER_1
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Vughila,        132456   WORKING      Miro,      Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Proshont                              Profal     Monthly
                                                                                                    10-DEC-16  16-DEC-16   No
Employee  Diiri, Maaor    113456   NOT ENTERED  Kargannkir,Company-abcde INPayroll
                                                Bivnath    Monthly
                                                                                                 10-DEC-16  16-DEC-16   No
Employee  Kimit, Gongobhar111111   WORKING      Chondorkor,Company-abcde INProjects              10-DEC-16  16-DEC-16   No
                                                Avissku    Monthly
Employee  Kalvornu,       110077   WORKING      Kindipur,  Company-abcde INPayroll               10-DEC-16  16-DEC-16   No
          Churali                               Barinakir  Monthly
Agency    Dhilorii,       100009   NOT ENTERED  Nook,                      Projects    CONTRACT
ContractorBohishik                              Lurukont                               SUPPLIER_2

我从软件生成的报告中获取此文件。我想解析文件并将数据导出为CSV。我尝试了this,但这没有用,因为我的数据结构是如此不同。

然后我尝试了这个:

$input = Get-Content "C:\Users\user.name\Desktop\GBS\text_file.txt"  

$data = $input[1..($input.Length - 1)]

$maxLength = 0

$objects = foreach ($record in $data) {
    $split = $record -split "\s{2,}|\t+"
    if ($split.Length -gt $maxLength) {
        $maxLength = $split.Length
    }
    $props = @{}
    for ($i=0; $i -lt $split.Length; $i++) {
        $props.Add([String]($i+1), $split[$i])
    }
    New-Object -TypeName PSObject -Property $props
}

$headers = [String[]](1..$maxLength)

$objects | 
    Select-Object $headers | 
    Export-Csv -NoTypeInformation -Path "C:\Users\user.name\Desktop\GBS\out.csv"

但这搞砸了每排的第二行。问题是在原始文本文件中,每隔一行也是第一行的一部分。在某些情况下,甚至第三行也是第一行数据的一部分。

如果有任何可以提供的信息来更好地表达我的问题,请告诉我。

在@Assgar的评论之后,我尝试了这个:

# read text file into single string and remove header
$rawText = Get-Content 'C:\path\to\input.txt' | Out-String

# split string into individual records
$data = $rawText -replace "`r" -split '\n\n+' | Select-Object -Skip 1

$parsedData = foreach ($record in $data) {
    $prop = @{}
    $record -split '\n' | ForEach-Object {
        $prop['PersonType'] += $_.Substring(0, 10).Trim()
        $prop['PersonName'] += $_.Substring(10, 16).Trim()
        $prop['PersonNumber'] += $_.Substring(26, 9).Trim()
        $prop['ApprovalStatus'] += $_.Substring(35, 13).Trim()
        $prop['Supervisor'] += $_.Substring(48, 11).Trim()
        $prop['PayrollName'] += $_.Substring(59, 16).Trim()
        $prop['ApplicationName'] += $_.Substring(75, 13).Trim()
        $prop['Supplier'] += $_.Substring(88, 9).Trim()
        $prop['StartDate'] += $_.Substring(97, 12).Trim()
        $prop['EndDate'] += $_.Substring(109, 9).Trim()
        $prop['Archived'] += $_.Substring(118, 8).Trim()
    }

    New-Object -Type PSObject -Property $prev
}

$parsedData | Export-Csv 'C:\path\to\output.txt' -NoType

但是现在我在目标文件夹中得到一个空白输出CSV文件。我在某个地方遗失了什么吗?

1 个答案:

答案 0 :(得分:0)

我有一个解决方案,但是......
它使用两个拆分,第一个采用单词(Person | Agency | Employee)
分裂记录(有缺陷需要if),
第二个在换行符时拆分,然后解析偏移量+长度 由于样本数据不一致,这也不完美。

$InFile = 'Q:\Test\2016-12\19\41225200.txt'
$OutFile= 'C:\path\to\output.txt'

$Delimiter = '(Person|Agency|Employee)'
#'$Escaped   = [regex]::Escape($Delimiter)
$Split     = "(?!^)(?=$Delimiter)"

$parsedData = (Get-Content $InFile -Raw) -split $Split | 
    ForEach-Object {
        $prop = @{}
        If ($_.Length -ge 30 ) {
            ForEach ($Line in $_.split("`n")) {
                $Line+=" "*130
                $prop['PersonType']      += $Line.Substring( 0, 10).Trim()
                $prop['PersonName']      += $Line.Substring(10, 16).Trim()
                $prop['PersonNumber']    += $Line.Substring(26,  9).Trim()
                $prop['ApprovalStatus']  += $Line.Substring(35, 13).Trim()
                $prop['Supervisor']      += $Line.Substring(48, 11).Trim()
                $prop['PayrollName']     += $Line.Substring(59, 16).Trim()
                $prop['ApplicationName'] += $Line.Substring(75, 12).Trim()
                $prop['Supplier']        += $Line.Substring(87, 10).Trim()
                $prop['StartDate']       += $Line.Substring(97,  9).Trim()
                $prop['EndDate']         += $Line.Substring(108, 9).Trim()
                $prop['Archived']        += $Line.Substring(117, 8).Trim()
            }
        }
        New-Object -TypeName PSObject -Property $prop
}
$parsedData

输出

Supervisor      : ApplicatioName
ApplicationName : t Date End DName
Archived        :
PersonType      : Person   AType
PersonName      : pproval     Supe
Supplier        : ate Archiv
StartDate       : ed
ApprovalStatus  : yroll NameStatus
PayrollName     : n Supplier  Star
PersonNumber    : rvisor PaNumber
EndDate         :


Supervisor      : Power,Mehash
ApplicationName : Projects
Archived        : No
PersonType      : AgencyContractor
PersonName      : D'Cunha, Yionue
Supplier        : CONTRACTSUPPLIER_1
StartDate       : 10-DEC-16
ApprovalStatus  : NOT ENTERED
PayrollName     :
PersonNumber    : 123456
EndDate         : 16-DEC-16


Supervisor      : Miro,Profal
ApplicationName : Payroll
Archived        : NoNo
PersonType      : Employee
PersonName      : Vughila,Proshont
Supplier        :
StartDate       : 10-DEC-1610-DEC-16
ApprovalStatus  : WORKING
PayrollName     : Company-abcde INMonthly
PersonNumber    : 132456
EndDate         : 16-DEC-1616-DEC-16

我尝试export-csv也是空的。