awk运行总计数和总和(续)

时间:2015-04-11 16:20:02

标签: awk

在上一篇文章的后续文章中,如何计算供应商每日($ 1)和地区($ 1)的80%-20%规则贡献。

输入文件根据日期和时间进行排序。区域和金额从最高到最低

Input.csv

Date,Region,Vendor,Amount
5-Apr-15,east,aa,123
5-Apr-15,east,bb,50
5-Apr-15,east,cc,15
5-Apr-15,south,dd,88
5-Apr-15,south,ee,40
5-Apr-15,south,ff,15
5-Apr-15,south,gg,10
7-Apr-15,east,ii,90
7-Apr-15,east,jj,20

在上面的输入中,基于日期($ 1)和区域($ 2)字段需要填充运行总额金额,然后计算当天金额的运行总和的百分比。地区

Date,Region,Vendor,Amount,RunningSum,%RunningSum
5-Apr-15,east,aa,123,123,65%
5-Apr-15,east,bb,50,173,92%
5-Apr-15,east,cc,15,188,100%

5-Apr-15,south,dd,88,88,58%
5-Apr-15,south,ee,40,128,84%
5-Apr-15,south,ff,15,143,93%
5-Apr-15,south,gg,10,153,100%

7-Apr-15,east,ii,90,90,82%
7-Apr-15,east,jj,20,110,100%

一旦得出80%或首次击中80%以上需要考虑为80%贡献剩余的项目需要考虑为20%的贡献。

Date,Region,Countof80%Vendor, SumOf80%Vendor, Countof20%Vendor, SumOf20%Vendor
5-Apr-15,east,2,173,1,15
5-Apr-15,south,2,128,2,25
7-Apr-15,east,1,90,1,20

1 个答案:

答案 0 :(得分:1)

这个awk脚本将帮助您完成第一部分,询问您是否需要澄清。基本上它将值存储在数组中,并在解析文档后打印出所请求的信息。

awk -F',' 'BEGIN{OFS=FS}
    NR==1{print $0, "RunningSum", "%RunningSum"}
    NR!=1{  
        if (date == $1 && region == $2) {
            counts[i]++
            cities[i][counts[i]] = $3
            amounts[i][counts[i]] = $4
            rsum[i][counts[i]] = rsum[i][counts[i] - 1] + $4
        } else {
            date = $1; region = $2
            dates[++i] = $1 
            regions[i] = $2
            counts[i] = 1
            cities[i][1] = $3
            amounts[i][1] = $4
            rsum[i][1] = $4
        }
    }
    END{
        for(j=1; j<=i; j++) {
            total = rsum[j][counts[j]];
            for (k=1; k<=counts[j]; k++) {
                print dates[j], regions[j], cities[j][k], amounts[j][k], rsum[j][k], int(rsum[j][k]/total*100) "%"
            }
            if (j != i) { print "" }
        }
    }' yourfilename

第二部分可以像这样完成(使用第一个awk脚本的输出):

awk -F'[,%]' 'BEGIN{ OFS="," }
    NR==1 || $0 ~ /^$/ {
        over = ""
        record = 1
    }
    ! (NR==1 || $0 ~ /^$/) {
        if (record) {
            dates[++i] = $1
            regions[i] = $2
            record = ""
        }
        if (over) {
            twenty[i]++
            twenties[i] += $4
        } else {
            eighty[i]++
            eighties[i] += $4
        }
        if ($6 >= 80) {
            over = 1
        }
    }
    END {
        print "Date","Region","Countof80%Vendor", "SumOf80%Vendor", "Countof20%Vendor", "SumOf20%Vendor"
        for (j=1; j<=i; j++) {
            print dates[j], regions[j], eighty[j], eighties[j], twenty[j], twenties[j]
        }
    }' output/file/of/first/script