使用 awk 比较 2 个文件并打印匹配和非匹配行

时间:2021-03-01 12:32:08

标签: awk

我有 2 个包含匹配和不匹配字段的 CSV 文件。
我想比较第二、第三和第四列,并基于它想要将输出作为匹配(M)、非匹配(NM)和未找到(NF 与 NULL)列。

a) 如果上列。 2,3 和 4 完全匹配,则为匹配。
b) 如果上校。 2 和 3 匹配但不是第 4 则它应该不匹配。
c) 如果上校。 2 或 col 3 本身不匹配,则应该是未找到的情况。

1.csv

SL_2344,personal_details,name,Andrew  
SL_2344,personal_details,address,G-101 SSR New-Delhi  
SL_2344,personal_details,Age,22Y  
SL_2344,personal_details,sex,M  
SL_2344,personal_details,height,5.8 ft  
SL_2344,education,Roll_number,22345  
SL_2344,education,stream,ScienceandMaths  
SL_2344,class,section,3D

2.csv

SL_12332,personal_details,name,Samantha  
SL_12332,personal_details,address,Park Street Mumbai  
SL_12332,personal_details,Age,22Y  
SL_12332,personal_details,sex,F  
SL_12332,height,5.8 ft  
SL_12332,class,section,3D  
SL_12332,candidate_Other_details,sports,stateLevelBasketballrepresentation

期望输出

Class,Attributes,2344,12332,Remarks  
personal_details,name,Andrew,Samantha,NM  
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM  
personal_details,Age,22Y,22Y,M  
personal_details,sex,M,F,NM  
personal_details,height,5.8 ft,NULL,NF  
education,Roll_number,22345,NULL,NF  
education,stream,ScienceandMaths,NULL,NF  
class,section,3D,3D,M  
NULL,height,NULL,5.3 ft,NF  
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF

我已经尝试使用 NR、FNR 将 awk 关联数组组合为 $2、$3 和 $4,但无法获得所需的结果。
有些记录,如第 5 行,文件 2.csv,只有属性(没有类对象),其值保留在我的代码失败的第 3 列中。对于此类记录,NULL 或 Blank 可用于 $2。

1 个答案:

答案 0 :(得分:1)

使用 GNU awk

 awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv

说明:

awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
                   map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
                   next 
                 } 
                 { 
                   split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
                   if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
                      print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
                   } 
                   else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
                      print $0",NM" 
                   } 
                   else { 
                      print $0",NF"                                           # Print the "NF" entries in all other cases.
                   } 
                  }' 1.csv 2.csv
相关问题