比较两个文件和两个字段 - 续:

时间:2014-08-27 18:14:42

标签: bash unix awk

想要将第一个文件字段$ 4与第二个文件字段$ 1和第一个文件字段$ 8与第二个文件字段$ 3进行比较。 然后从第一个文件和IF字段$ 1匹配打印匹配案例然后从第二个文件打印相应字段$ 2, 字段$ 3匹配然后打印相应字段$ 4以及第二个文件。

Input.csv

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1
ABCD,SSS,EFG,1234,9999,2345,AB,40,2,5,2
ABCD,SSS,EFG,1234,9999,2345,AB,60,3,6,3
ABCD,SSS,EFG,3456,9999,2345,AB,30,1,4,1
ABCD,SSS,EFG,3456,9999,2345,AB,40,2,5,2
ABCD,SSS,EFG,3456,9999,2345,AB,60,3,6,3
ABCD,SSS,EFG,5678,9999,2345,AB,30,1,4,1
ABCD,SSS,EFG,5678,9999,2345,AB,40,2,5,2
ABCD,SSS,EFG,5678,9999,2345,AB,60,3,6,3

master.csv

SendMobNum,Year,Amount,Gender 
1234,2000,30,Male
5678,2001,15,Female
2345,2002,60,Female
4567,2003
8888,2004

期望的输出:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,SendMobNum,Year,Amount,Gender 
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1,1234,2000,30,Male
ABCD,SSS,EFG,1234,9999,2345,AB,60,3,6,3,1234,2000,60,Female
ABCD,SSS,EFG,5678,9999,2345,AB,30,1,4,1,5678,2001,30,Male
ABCD,SSS,EFG,5678,9999,2345,AB,60,3,6,3,5678,2001,60,Female

尝试过以下命令和部分:

awk -F, '
    NR == FNR {send[$1]; amt[$3]; next} 
    FNR == 1 || ($4 in send && $8 in amt) { print $0","send[$1] ","send[$2]","amt[$3]","amt[$4]}
' master.csv Input*.csv

任何建议......

编辑:希望将mater.csv视为两组不同的数据,

集#1

SendMobNum,Year(i.e Desc of SendMobNum)
1234,2000
5678,2001
2345,2002
4567,2003
8888,2004

集#2

Amount,Gender (i.e Desc of Amount)
30,Male
15,Female
60,Female

示例#1:如果Input.Field $ 4 == 1234而Input.Field $ 8 == 30

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1

运算#1:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,SendMobNum,Year,Amount,Gender 
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1,1234,2000,30,Male

示例#2:如果Input.Field $ 4 == 1234并且Input.Field $ 8 == 15

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee
ABCD,SSS,EFG,1234,9999,2345,AB,15,1,4,1

运算#2:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,SendMobNum,Year,Amount,Gender 
ABCD,SSS,EFG,1234,9999,2345,AB,15,1,4,1,1234,2000,15,Female

示例#3:如果Input.Field $ 4 == 1234而Input.Field $ 8 == 60则打印第二个文件1234,2000,60,女性

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee
ABCD,SSS,EFG,1234,9999,2345,AB,60,1,4,1

运算#3:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,SendMobNum,Year,Amount,Gender 
ABCD,SSS,EFG,1234,9999,2345,AB,60,1,4,1,1234,2000,60,Female

更新:2014年8月28日

哇,非常感谢Ed Morton非常好的提示,天才! 我尝试过试验和错误,得到以下输出。 我在使用数组时感到困惑,无法理解数组概念,如何调试或检查命令,是否正在访问第一个文件第一行然后检查第二个文件的整个文件等等...

尝试#1 :(来自master.csv没有$ 1和$ 3)

awk '
    BEGIN{ FS=OFS="," }
    NR == FNR { mob2year[$1]=$2;amt2gender[$3]=$4; next}
    FNR == 1 || ($4 in mob2year && $8 in amt2gender) { print $0,mob2year[$4],amt2gender[$8] }
' Master.txt Input*.txt

输出:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,Year,Gender
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1,2000,Male
ABCD,SSS,EFG,1234,9999,2345,AB,60,3,6,3,2000,Female
ABCD,SSS,EFG,5678,9999,2345,AB,30,1,4,1,2001,Male
ABCD,SSS,EFG,5678,9999,2345,AB,60,3,6,3,2001,Female

尝试#2 :(没有来自master.csv的$ 1和$ 3,但填充了来自Input.csv的信息)想知道如何从master.csv打印$ 1和$ 3

awk '
    BEGIN{ FS=OFS="," }
    NR == FNR { mob2year[$1]=$2;amt2gender[$3]=$4; next}
    FNR == 1 || ($4 in mob2year && $8 in amt2gender) { print $0,$4,mob2year[$4],$8,amt2gender[$8] }
' Master.txt Input*.txt

输出:

Transaction ID,Request source,User name,SendMobNum,RecMobNum,ServiceClass,Service,Amount,CreditAmount,Bonus,Process fee,SendMobNum,Year,Amount,Gender
ABCD,SSS,EFG,1234,9999,2345,AB,30,1,4,1,1234,2000,30,Male
ABCD,SSS,EFG,1234,9999,2345,AB,60,3,6,3,1234,2000,60,Female
ABCD,SSS,EFG,5678,9999,2345,AB,30,1,4,1,5678,2001,30,Male
ABCD,SSS,EFG,5678,9999,2345,AB,60,3,6,3,5678,2001,60,Female

注释:

awk '
    BEGIN{ FS=OFS="," }                     'Assign Input/Output separator as ","
    NR == FNR { mob2year[$1]=$2;amt2gender[$3]=$4; next}    'Create array mob2year and store $1 & $2 unique values from Master.txt then  Create array amt2gender and store $3 & $4 unique values from Master.txt
                                    'Read all the lines from Master.txt store into mob2year and amt2gender  
    FNR == 1 || ($4 in mob2year && $8 in amt2gender)        'If NR==1 OR ( $4 from Input.txt in mob2year array  AND $8 from amt2gender ) then 
        { print $0,$4,mob2year[$4],$8,amt2gender[$8] }      'print entire line from Input.txt ($0) , $4 from Input.txt, $8 from Input.txt 
                                    'Not able to understand mob2year[$4] and amt2gender[$8] logic values
' Master.txt Input*.txt

1 个答案:

答案 0 :(得分:1)

我真的认为你可以自己做这件事,但这里有一个提示:你希望Master.csv像2个不同的数据集那样处理,从而填充2个不同的数组,mob2year[$1]=$2amt2gender[$3]=$4 。现在,当您阅读Input.csv时,只需通过mob2year[$4]amt2gender[$8]访问这些内容。尝试使用该提示自行创建脚本,并在测试后使用脚本更新您的问题,并在需要帮助时发表评论。

这可能会帮助您理解关联数组:

$ cat file1
fruit apple
color red
size large
$
$ cat file2
size fruit garbage color
$
$ awk 'NR==FNR{ a[$1]=$2; next} {print $1, a[$1]}' file1 file2
size large
$ awk 'NR==FNR{ a[$1]=$2; next} {print $2, a[$2]}' file1 file2
fruit apple
$ awk 'NR==FNR{ a[$1]=$2; next} {print $3, a[$3]}' file1 file2
garbage
$ awk 'NR==FNR{ a[$1]=$2; next} {print $4, a[$4]}' file1 file2
color red
$ awk 'NR==FNR{ a[$1]=$2; next} {print $5, a[$5]}' file1 file2

$

使用上述内容,添加一些打印件等