每六行提取一列

时间:2015-08-11 12:59:30

标签: file awk sed filter

我有一个看起来像这样的文件:

194170,46.9,42.2
194170,47.7,40.0
194170,48.5,42.0
194170,48.6,43.0
194170,49.8,39.2
194170,50.2,43.3
194179,44.9,36.9
194179,45.3,36.3
194179,46.4,36.9
194179,47.5,34.4
194179,48.0,40.0
194179,49.6,37.1
194184,52.8,51.1
194184,52.9,49.8
194184,54.0,51.9
194184,56.8,54.9
194184,57.6,53.6
194184,57.8,52.9
...

对于给定的行,第一个数字是ID,第二个和第三个数字是我感兴趣的。对于具有相同ID的行(即每六行),相同的数字列是连续年份的数字。我想最终得到一个看起来像这样的文件:

194170,46.9,47.7,48.5,48.6,49.8,50.2
194170,42.2,40.0,42.0,43.0,39.2,43.3
194179,44.9,45.3,46.4,47.5,48.0,49.6
194179,36.9,36.3,36.9,34.4,40.0,37.1

也就是说,对于具有相同ID的行,我想将第二列中的连续数字组合在一起,同样将第三列分组。

这可能与awk / sed / others有关吗?

3 个答案:

答案 0 :(得分:1)

awk的另一个答案:

awk -F, '{a[$1] = a[$1]","$2}END{for(i in a) print i a[i]}' yourfile

对于两列:

awk -F, '{a[$1] = a[$1]","$2;b[$1] = b[$1]","$3}END{for(i in a) print i a[i]"\n"i b[i]}' yourfile

无论如何,我更喜欢R中的tidyR来完成这项任务。

答案 1 :(得分:0)

使用awk

awk -F',' '{ a[$1] = a[$1] ? a[$1] FS $2 : $2 ; b[$1] = b[$1] ? b[$1] FS $3 : $3}
   END { for(idx in a){ print idx,a[idx] ; print idx,b[idx]}}' yourfile

说明:

  • -F字段分隔符
  • a[]将有第二列值
  • b[]将有第三列值
  • END{}打印值

示例:

$ awk -F',' '{ a[$1] = a[$1] ? a[$1] FS $2 : $2 ; b[$1] = b[$1] ? b[$1] FS $3 : $3}
   END { for(idx in a){ print idx,a[idx] ; print idx,b[idx]}}' yourfile
194170 46.9,47.7,48.5,48.6,49.8,50.2
194170 42.2,40.0,42.0,43.0,39.2,43.3
194184 52.8,52.9,54.0,56.8,57.6,57.8
194184 51.1,49.8,51.9,54.9,53.6,52.9
194179 44.9,45.3,46.4,47.5,48.0,49.6
194179 36.9,36.3,36.9,34.4,40.0,37.1

答案 2 :(得分:0)

另一个没有使用数组并保持原始顺序的awk版本(如果它是一个非常大的文件而不是使用数组,那么你不想加载所有数据在打印之前进入内存 - 否则,阵列版本很好,假设您不关心订购)。

BEGIN { FS = OFS = "," }

!prev_id { prev_id = $1 }

$1 == prev_id { r1 = r1 OFS $2; r2 = r2 OFS $3 }

$1 != prev_id { print prev_id r1 ORS prev_id r2; 
                r1 = OFS $2; r2 = OFS $3; prev_id = $1 }

END { print prev_id r1 ORS prev_id r2 }


$ awk -f v3.awk file.txt
194170,46.9,47.7,48.5,48.6,49.8,50.2
194170,42.2,40.0,42.0,43.0,39.2,43.3
194179,44.9,45.3,46.4,47.5,48.0,49.6
194179,36.9,36.3,36.9,34.4,40.0,37.1
194184,52.8,52.9,54.0,56.8,57.6,57.8
194184,51.1,49.8,51.9,54.9,53.6,52.9
相关问题