如何通过键将多个字符串折叠为一个?

时间:2017-06-27 08:43:22

标签: python bash perl awk

例如我有一个文件:

key1   1212,32332
key2   1212,3232,3232

我想获取文件:

if ((context->count[0] += ((UINT4)inputLen << 3)) < ((UINT4)inputLen << 3))
    context->count[1]++;
context->count[1] += ((UINT4)inputLen >> 29);

4 个答案:

答案 0 :(得分:1)

在awk中:

$ awk '{a[$1]=a[$1](a[$1]==""?"":",")$2}END{for(i in a)print i,a[i]}' file
key1 1212,32332
key2 1212,3232,3232

说明:

awk '{                                        # use awk for this kind of stuff
    a[$1]=a[$1] ( a[$1]=="" ? "" : "," ) $2   # hash on first col and append seconds
}
END {                                         # after everything is hashed
    for(i in a)                               # for each entry in hash a
        print i,a[i]                          # output key and data
}' file                                       # oh yeah the file

编辑:我们可以使用a对文件进行排序,然后输出密钥和所有数据,而不是让awk进行缓冲(即散列到sort)之后以逗号分隔。再次使用awk作为后一部分:

$ sort file | awk '$1!=p{printf "%s%s",(NR>1?ORS:""),$1}{printf "%s%s", ($1==p?",":OFS),$2;p=$1}END{print ""}'
key1 1212,32332
key2 1212,3232,3232

这里sort没有给出任何花哨的参数,但在现实世界中可能需要一些参数。 awk部分解释说:

sort file | \                          # sort the file
awk '                                  # before feeding to awk
$1!=p {                                # if key is different from previous key
    printf "%s%s",(NR>1?ORS:""),$1     # newline and print the key
}
{
    printf "%s%s", ($1==p?",":OFS),$2  # print the data comma-separated 
    p=$1                               # store key for comparing on the next round
}
END{ 
    print ""                           # finish the last line nicely
}'

答案 1 :(得分:0)

awk '{a[$1]=(a[$1]!="")?a[$1]","$2:$2}END{for(i in a){print i "\t" a[i]}}' file
key1    1212,32332
key2    1212,3232,3232

应该这样做。

答案 2 :(得分:0)

如果您想避免缓冲整个文件的结果(例如,如果文件非常大),您可以使用sort和Python的itertools.groupby。像这样创建一个Python脚本:

# group.py

import itertools, sys

for k, g in itertools.groupby(sys.stdin, lambda x: x.split()[0]):
    print(k, ",".join([x.split()[1] for x in g]))

然后运行:

sort file | python group.py 
key1 1212,32332
key2 1212,3232,3232

否则,这个快速的Perl单行程也可以通过在哈希中累积值来实现:

perl -aE 'push @{$h{$F[0]}}, $F[1]; END {$"= ","; say "$_ @{$h{$_}}" for sort keys %h}' file

输出:

key1 1212,32332
key2 1212,3232,3232

答案 3 :(得分:-1)

它不是纯粹的sh / coreutils,但考虑使用datamash来执行此任务:

sed -r -e 's/[[:space:]]+/ /g' < infile.txt | datamash -t ' ' -s groupby 1 collapse 2