sed或awk将两个文件与sum相结合

时间:2015-05-20 10:41:23

标签: awk sed squid

我们有脚本从squid access.log中删除除域名之外的所有内容并报告每个URL的总点击数,我生成了两个文件,其中一个带缓存命中,另一个带缓存未命中,我正在寻找一种方法来组合这些文件如下 -

cat TCP_MISS_data.txt

Domains    CacheMiss
abc.com    21
def.com    38
xyz.com    12

cat TCP_HITS_data.txt

Domains  CacheHits
def.com  28
abc.com  10
xyz.com

cat Combined_data.txt

Domains    CacheMiss CacheHits  TotalHits
abc.com     21        10          31
def.com     38        28          66
xyz.com     12        0           12

感谢任何帮助。

更新

我使用下面的awk one liner来从访问日志中删除域和命中,并输出一个包含所有域及其命中的文件,而不管HITS和MISSES。

cat access.log | awk '{print $7}' | awk '!/^http/{sub(/^/,"http://")}1' | awk -F"/" '{print $3}' | awk -F":" '{print $1}' | awk -F"." '{f1=NF;f2=NF-1;print $f2 "." $f1}' | sort | uniq -c | sort -n

分开我在下面所做的命中和失误 -

cat access.log | grep TCP_MISS | awk '{print $7}' | awk '!/^http/{sub(/^/,"http://")}1' | awk -F"/" '{print $3}' | awk -F":" '{print $1}' | awk -F"." '{f1=NF;f2=NF-1;print $f2 "." $f1}' | sort | uniq -c | sort -n > TCP_MISS_data

cat access.log | grep TCP_HIT | awk '{print $7}' | awk '!/^http/{sub(/^/,"http://")}1' | awk -F"/" '{print $3}' | awk -F":" '{print $1}' | awk -F"." '{f1=NF;f2=NF-1;print $f2 "." $f1}' | sort | uniq -c | sort -n > TCP_HITS_data

现在我最终得到了两个文件,TCP_MISS_data和TCP_HITS_data,它们有不相等的行,我试图将这两个文件组合起来,如上文所述。

1 个答案:

答案 0 :(得分:3)

这个oneliner可以完成这项工作:

 awk 'NR==FNR{a[$1]=$2;next}
     $1 in a{printf "%s %s %s %s\n", $1,a[$1],($2?$2:0),(FNR>1?a[$1]+$2:"TotalHits")}' missFile hitFile

要获得“漂亮”的输出格式,您可以调整printf格式,或者只是将结果通过column -t awk ..... |column -t传递给kent$ head f* ==> f1 <== Domains CacheMiss abc.com 21 def.com 38 xyz.com 12 ==> f2 <== Domains CacheHits def.com 28 abc.com 10 xyz.com kent$ awk 'NR==FNR{a[$1]=$2;next}$1 in a{printf "%s %s %s %s\n", $1,a[$1],($2?$2:0),(FNR>1?a[$1]+$2:"TotalHits")}' f1 f2|column -t Domains CacheMiss CacheHits TotalHits def.com 38 28 66 abc.com 21 10 31 xyz.com 12 0 12

使用您的示例输入:

awk 'NR==FNR{a[$1]=$2;next}           #process the first file, store in a hashtable, key:col1, value:col2
$1 in a                               #starting processing 2nd file, if file2.col1 in hashtable, do followings:
{printf "%s %s %s %s\n", $1,a[$1],    #printf output with format
($2?$2:0),                            #if file2.cols was empty, we take it as 0
(FNR>1?a[$1]+$2:"TotalHits")          #if first line, we dont do sum, print "totalHits" text
}' f1 f2                              #two input files

编辑:

添加一些解释:

@Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        String action = intent.getAction();
        if (action.equals(ACTION_PLAY))
            processPlayRequest();
        else if (action.equals(ACTION_PAUSE))
            processPauseRequest();
        else if (action.equals(ACTION_SKIP))
            processSkipRequest();
        else if (action.equals(ACTION_STOP))
            processStopRequest();
        else if (action.equals(ACTION_REWIND))
            processRewindRequest();
        else if (action.equals(ACTION_URL))
            processAddRequest(intent);

        return START_NOT_STICKY; // Means we started the service, but don't want
                                 // it to
                                 // restart in case it's killed.
    }