如何使用AWK将文本文件转换为csv

时间:2010-09-15 04:41:08

标签: awk

我正在收集所有用户的usb使用细节并将其转换为CSV文件,以便我可以将其导出到某个数据库中。 输入文本文件如下: -

USB History Dump
by nabiy (c)2008 
(1) --- Kingston DataTraveler 130 USB Device 
instanceID: 0018F3D974B4A9C0E1760896&0
ParentIdPrefix: 7&b62e00e&2
Last Mounted As: \DosDevices\I:
Driver:{4D36E967-E325-11CE-BFC1-08002BE10318}\0033
Disk Stamp: 09/07/2010 15:07
Volume Stamp: 09/07/2010 15:07 
(2) --- Kingston DataTraveler 2.0 USB Device 
instanceID: 001D0F1E35B25B8C1201011B&0
ParentIdPrefix: 7&1f5848f3&0
Driver:{4D36E967-E325-11CE-BFC1-08002BE10318}\0035
Disk Stamp: 09/06/2010 15:18
Volume Stamp: 09/06/2010 15:18 
(3) --- Maxtor OneTouch III USB Device 
instanceID: 044303E5&0
ParentIdPrefix: 
Driver:{4D36E967-E325-11CE-BFC1-08002BE10318}\0032
Disk Stamp: 09/10/2010 10:09
Volume Stamp: 03/12/2010 10:42 

如何解析此文件以便我可以使用以下格式:

hostname Devic_name instanceID ParentPrefix LastMountedAs Driver 
pcname kingston xxxx xxxxxxxxx xxxxxxxxxx xxxxxxxx
pcname maxtor 0440xxx 4 d 367 08/07/2010 xxxxxxxx
pcname kingston xxxxxxx xxxxxxx xxxxxxxxx xxxxxxxx

pc名称将取自hostname命令。

对于带有一些批处理或awk脚本的数据库,所需的输出为CSV格式。 任何建议都非常感谢。

3 个答案:

答案 0 :(得分:0)

thread可能会对您有所帮助......

答案 1 :(得分:0)

使用Perl,可以像这样处理,生成真正的CSV数据:

use strict;
use warnings;

my @keys = ( "Device_Name", "instanceID", "ParentIdPrefix",
             "Last Mounted As", "Driver" );

my %values = ();
my $host = qx/hostname/;
chomp $host;

while (<>)
{
    chomp;
    next unless m/^\(\d+\) ---/ || m/^[\w ]+:/;
    if (m/\(\d+\) --- (\w+)/)
    {
        dump_entry(\%values);
        %values = ();
        $values{Device_Name} = $1;
    }
    else
    {
        my($key,$value) = split /:/;
        $value =~ s/^\s+//;
        $value =~ s/\s+$//;
        $values{$key} = $value if $value ne "";
    }
}
dump_entry(\%values);

sub dump_entry
{
    my($ref) = @_;
    my(%values) = %$ref;
    return if (scalar(keys %values) == 0);
    print qq%"$host"%;
    foreach my $key (@keys)
    {
        my $value = $values{$key} // "--none--";
        print qq%,"$value"%;
    }
    print "\n";
}

给定数据文件的示例输出:

"yourpcname","Kingston","0018F3D974B4A9C0E1760896&0","7&b62e00e&2","\DosDevices\I","{4D36E967-E325-11CE-BFC1-08002BE10318}\0033"
"yourpcname","Kingston","001D0F1E35B25B8C1201011B&0","7&1f5848f3&0","--none--","{4D36E967-E325-11CE-BFC1-08002BE10318}\0035"
"yourpcname","Maxtor","044303E5&0","--none--","--none--","{4D36E967-E325-11CE-BFC1-08002BE10318}\0032"

请注意,数据以读取的顺序显示,与问题中的输出数据不同。

答案 2 :(得分:0)

遵循旧的传统,这里是Jonathan代码的awk版本: - )

cat tess |awk '
function cmd( E, A, this,v){ A[0]=0;while((E |getline v)>0)A[A[0]+=1]=v;A["RETURN_CODE"]=close(E);}
 # whatever cvs format you perfer. Here we used a traditional type, with 
 # escape sequence when 0x22 or , is present.
function cvs(s){gsub(",","\\,",s);gsub("\"","\\\"",s);return ((s)?"\""s"\"":"\"--none--\"");};
BEGIN{
    cmd("hostname",A);host=A[1];
    f=0;
    n=0;
    print "hostname Devic_name instanceID ParentPrefix LastMountedAs Driver ";# Header
    while(1){
        while((getline r )>0){
            if(r~"^[(][0-9]*[)]"){n=1;break;}
            if(r!~":")continue;
            key = substr(r,match(r,"^[^:]*"),RLENGTH);sub("^:[ \t]*","",key);
            match(r,"^[^:]*[:][ \t]*");
            value = substr(r,RSTART+RLENGTH);sub("[\t ]*$","",value);
            A[key]=value;
        }
        if(f){
            print cvs(host)","cvs(A["devic_name"])","cvs(A["instanceID"])","cvs(A["ParentIdPrefix"])","cvs(A["Last Mounted As"])","cvs(A["Driver"]);
            delete A;
        }
        if(!n)break;
        if(n)n=0;
        f=1;
        sub("^[(][0-9]*[)][ \t]*---[ \t]*","",r);
        sub("[ ]*USB Device[ ]*$","",r);
        A["devic_name"] = r; 
        continue;
    }
}'

示例输出如下

hostname Devic_name instanceID ParentPrefix LastMountedAs Driver
"host","Kingston DataTraveler 130","0018F3D974B4A9C0E1760896&0","7&b62e00e&2","\DosDevices\I:","{4D36E967-E325-11CE-BFC1-08002BE10318}\0033"
"host","Kingston DataTraveler 2.0","001D0F1E35B25B8C1201011B&0","7&1f5848f3&0","--none--","{4D36E967-E325-11CE-BFC1-08002BE10318}\0035"
"host","Maxtor OneTouch III","044303E5&0","--none--","--none--","{4D36E967-E325-11CE-BFC1-08002BE10318}\0032"