Question

下午好

我得到的文件具有以下结构：

$ cat test
LogStartOffset,alex,1,4
LogEndOffset,alex,1,4
Size,alex,1,0
LogStartOffset,alvaro1,1,2
LogEndOffset,alvaro1,1,2
Size,alvaro1,1,0
LogStartOffset,alvaro_prueba,1,0
LogEndOffset,alvaro_prueba,1,0
Size,alvaro_prueba,1,0
LogStartOffset,arquitectura_jenkins_creardocumentacion,0,348
LogEndOffset,arquitectura_jenkins_creardocumentacion,0,387
Size,arquitectura_jenkins_creardocumentacion,0,11011
LogStartOffset,alex,0,445
LogEndOffset,alex,0,498
Size,alex,0,54670
...

位置：

  the field 2 is the topic name
  the field 3 is the partition of the topic defined in the field 2 
  the field 1 is the characteristic of the topic and its partition 
  the field 4 is the value of the charasteristic.

目前，我正在处理3个特征（将来可能会更多）：LogStartOffset，LogEndOffset和Size

我想要具有这种结构的输出文件

主题，分区，LogStartOffset的值，LogEndOffset的值，大小的值

所以file.out应该是这样的：

topic,partition,LogStartOffset's Value,LogEndOffset's Value,Size's Value
alex,1,4,4,0
alvaro1,1,2,2,0
alvaro_prueba,1,0,0
arquitectura_jenkins_creardocumentacion,0,348,387,11011
alex,0,445,498,54670
.....

这个想法是按字段2和3（主题和分区）分组的，因为我不知道字段1的安全性是否总是相同。

我想使用awk来执行此操作，但是我不知道如何计划字段2和3的分组以及如何将数据重新分组以具有所需的输出。

Answer 1

您的示例对每个主题-分区对具有三个特征，顺序为logstartoffset，logendoffset和大小。如果文件中每个主题分区对的名称的名称，编号和顺序相同，则

awk 'BEGIN{FS=","}{a[$2","$3] = a[$2","$3]","$4} END { for (i in a) print i a[i];}' test

将起作用。它不会生成标题行，也不会在文件中保留主题分区对的顺序。当我测试时，输出为：

alvaro1,1,2,2,0
alvaro_prueba,1,0,0,0
alex,0,445,498,54670
arquitectura_jenkins_creardocumentacion,0,348,387,11011
alex,1,4,4,0

我如何使用awk分组

1 个答案: