两个文件之间的模式匹配

时间:2015-05-28 11:48:18

标签: shell awk sed

我有两个文件:file1和file2

文件1:

1,0,0
2,1,2

file2的:

abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (sds; dks; id:2;)

输出:

#abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (sds; dks; id:2;)

如果file2中id之后的数字与file1的第一列匹配,

then: if third column in file1 is 0,print $1 of file2=abc else $1 of file=zxc
      if second column in file1 is 0,insert # at beginning 

另一个案例 文件1:

1,0,0
3,1,2

file2的:

abc gdksjhkjhfkdjfljdkjldk jkm kl;lll (sds; dks; id:1;)
zxc erefdjhfkdjfljdkjldk  erewr jkm kl;lll (ders; dks; id:2;)
sdsd sdsdsdsddddsdjldk  vbvewqr dsm wwl;awww (cvv; fgs; id:3;)

Sometimes,the files will contain different number of lines.
In that case,if column one in file1 does not match with id in file2,it has to continue checking with next line in file2

如何在不使用shellscript合并两个文件的情况下进行匹配和修改?

1 个答案:

答案 0 :(得分:2)

GNU awk 4

使用此awk脚本:

FNR==NR{
    arr[FNR][1] = $1
    arr[FNR][2] = $2
    arr[FNR][3] = $3
}
FNR!=NR{
    val = gensub(/.*id:([0-9]+)[^0-9]*.*/, "\\1", "g", $0)
    if (arr[FNR][1] == val) {
        if (arr[FNR][2] == 0)
            printf "#"
        if (arr[FNR][3] == 0)
            $1 = "a"
        else
            $2 = "b"
    }
    print $0
}

使用:awk -F '[, ]' -f script.awk file1 file2

调用它

GNU awk 3

尝试使脚本适用于早期版本的awk

# This awk script will perform these checks for EVERY single line:

# when FNR == NR we are in the first file
# FNR is the line number of the current file
# NR is the total number of lines passed
FNR==NR{
    # save the line of file1 to array with index it's line number
    arr[FNR] = $0
}
# we are now in file 2, because FNR could be 1 but NR is now 1 + lines
# in file 1
FNR!=NR{
    # create an array by splitting the corresponding line of file 1
    # we split using a comma: 0,1,2 => [0, 1, 2]
    split(arr[FNR], vals, ",")
    # use regex to extract the id number, we drop everything from the
    # line besides the number after "id:"
    val = gensub(/.*id:([0-9]+)[^0-9]*.*/, "\\1", "g", $0)
    # if first value of line in file1 is same as ID
    if (vals[1] == val) {
        # if second value of line in file1 is 0
        if (vals[2] == 0)
            # print # at beginning of line without adding a newline
            printf "#"
         # if third value of line in file1 is 0
        if (vals[3] == 0)
            # save "a" to var, else
            var = "a"
        else
            # save "b" to var
            var = "b"
    }
    # now sub the first word of the line [^ \t]* by var
    # and keep everything that follows (...) = \\1
    # the current line is $0
    # and print this modified line (now it's printed with a newline)
    print gensub(/^[^ \t]*([ \t].*)/, var "\\1", "g", $0)
}

简单地运行:

awk -f script.awk file1 file2