Question

我磕磕绊绊地试图让一件看似简单的东西完成。我有一个文件和一个换行符分隔的字符串列表。

文件：

Dat1 Loc1

Dat2 Loc1

Dat3 Loc1

Dat4 Loc2

Dat5 Loc2

我的列表是这样的：

Dat1

Dat2

Dat3

Dat4

我要做的是将列表与数据文件进行比较，并计算出现的唯一Loc的数量。我只对最大的数量感兴趣。在上面的示例中，将列表与文件进行比较时，我基本上想要：

Dat1 MATCHED Loc1Count = 1

Dat2 MATCHED Loc1Count = 2

Dat3 MATCHED Loc1Count = 3

Dat4 MATCHED Loc2Count = 1

返回： Loc1 如果Loc1Count /列表长度＆gt; 50％

现在，

我知道awk 1文件会逐行读取文件。此外我知道“echo”$ LIST“| awk'/搜索包含此/的行将返回与该内部字符串匹配的行。我不能成功地将这些想法结合起来作为嵌套的awk，更不用说如何计算“loc1”vs“loc2”（顺便说一下，它将是随机字符串，而不是形式标准）

我觉得这很简单，但我正撞在墙上。有任何想法吗？这是否足够清楚？

Answer 1

list="Dat1 Dat2 Dat3 Dat4"
awk -vli="$list" 'BEGIN{
   # here list from shell is converted to awk array "list". 
   m=split(li,list," ") 
}
{
    # go through the list 
    for(i=1;i<=m;i++){
        if($1 == list[i]){
            # if Dat? is found in list, print , at the same time
            print $1" matched Locount="$2" "++data[$2]   # increment the count for $2 and store in loc array
            loc[$2]++ 
        }
    }
} 
END{
    # here returns loc1 count
    loc1count=loc["Loc1"]
    if(( loc1count / m *100 ) > 50) {
        print "Loc1 count: "loc1count
    }
} ' file

输出

$ ./shell.sh
Dat1 matched Locount=Loc1 1
Dat2 matched Locount=Loc1 2
Dat3 matched Locount=Loc1 3
Dat4 matched Locount=Loc2 1
Loc1 count: 3

将文件与变量列表AWK进行比较

1 个答案: