我有 2 个包含匹配和不匹配字段的 CSV 文件。
我想比较第二、第三和第四列,并基于它想要将输出作为匹配(M)、非匹配(NM)和未找到(NF 与 NULL)列。
a) 如果上列。 2,3 和 4 完全匹配,则为匹配。
b) 如果上校。 2 和 3 匹配但不是第 4 则它应该不匹配。
c) 如果上校。 2 或 col 3 本身不匹配,则应该是未找到的情况。
SL_2344,personal_details,name,Andrew
SL_2344,personal_details,address,G-101 SSR New-Delhi
SL_2344,personal_details,Age,22Y
SL_2344,personal_details,sex,M
SL_2344,personal_details,height,5.8 ft
SL_2344,education,Roll_number,22345
SL_2344,education,stream,ScienceandMaths
SL_2344,class,section,3D
SL_12332,personal_details,name,Samantha
SL_12332,personal_details,address,Park Street Mumbai
SL_12332,personal_details,Age,22Y
SL_12332,personal_details,sex,F
SL_12332,height,5.8 ft
SL_12332,class,section,3D
SL_12332,candidate_Other_details,sports,stateLevelBasketballrepresentation
Class,Attributes,2344,12332,Remarks
personal_details,name,Andrew,Samantha,NM
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM
personal_details,Age,22Y,22Y,M
personal_details,sex,M,F,NM
personal_details,height,5.8 ft,NULL,NF
education,Roll_number,22345,NULL,NF
education,stream,ScienceandMaths,NULL,NF
class,section,3D,3D,M
NULL,height,NULL,5.3 ft,NF
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF
我已经尝试使用 NR、FNR 将 awk 关联数组组合为 $2、$3 和 $4,但无法获得所需的结果。
有些记录,如第 5 行,文件 2.csv,只有属性(没有类对象),其值保留在我的代码失败的第 3 列中。对于此类记录,NULL 或 Blank 可用于 $2。
答案 0 :(得分:1)
使用 GNU awk
awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv
说明:
awk -F, 'NR==FNR { # Set the field delimiter to ","
map[FNR]=$0; # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
next
}
{
split(map[FNR],map1,","); # For the second file, split the array entry into map1 using "," as the delimiter
if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) {
print $0",M" # Print "M" entries based on the logic outlined utilising the split entries in map1.
}
else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { # Print the "NM" entries based on the logic outlined
print $0",NM"
}
else {
print $0",NF" # Print the "NF" entries in all other cases.
}
}' 1.csv 2.csv