以特殊方式重组文件[Perl]

时间:2018-12-13 18:33:37

标签: perl

考虑以下文件:

5,*,ABC
6,5,XYZ
7,5,123
4,6,xyz
1,4,xox
8,6,yoy

每行的格式:(*没有父行)

pid,parent-pid,name

我想以某种方式创建以下文件:

ABC,
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xyx
ABC,XYZ,yoy

对于每个PID而言,我都可以在同一行上找到其最大的父级。 我想通过将其插入到哈希中来实现它(在Perl中)。问题是我真的不知道每行的长度然后是哈希的长度。 另外,我也在寻找最有效的方法。

哪种好的算法可以解决这个问题?

2 个答案:

答案 0 :(得分:2)

您可以构建由父pid键入的pid的哈希。

use feature qw( current_sub );

use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my %process_children_by_pid;
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
   my ($pid, $parent, $name) = @$row;
   $process_name_by_pid{$pid} = $name;
   push @{ $processes_children_by_pid{$parent} }, $pid;
}

sub {
   my $pid = pop;
   push @_, $process_name_by_pid{$pid};
   $csv->say(*STDOUT, \@_);
   __SUB__->(@_, $_) for @{ $processes_children_by_pid{$pid} };
}->($_) for @{ $processes_children_by_pid{'*'} };

或者您可以使用Graph.pm。这会增加开销,但使错误检查变得容易。

use feature qw( current_sub );

use Graph        qw( );
use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my $tree = Graph->new();
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
   my ($pid, $parent, $name) = @$row;
   $process_name_by_pid{$pid} = $name;
   $tree->add_edge($parent, $pid);
}

die "Bad data" if $tree->has_a_cycle;

my @roots = $tree->predecessorless_vertices();
die "Bad data" if @roots != 1 || $roots[0] ne '*';

sub {
   my $pid = pop;
   push @_, $process_name_by_pid{$pid};
   $csv->say(*STDOUT, \@_);
   __SUB__->(@_, $_) for $tree->successors($pid);
}->($_) for $tree->successors('*');

答案 1 :(得分:0)

我将通过存储父关系数组来处理它,然后在每次读取一行时遍历该数组:

my @parent;

open my $IN, '<', 'file' or die;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;
  $parent[$id] = [ $parent, $name ];

  if ($parent eq '*') {
    print $name;
  } else {
    my @output = ( [ $parent, $name ] );

    while (my $p = $parent[${$output[0]}[0]]) {
      unshift @output, $p;
    }

    print join ',', map { ${$_}[1] } @output;
  }

  print "\n";
}
close $IN;

输出:

ABC
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xox
ABC,XYZ,yoy

-编辑-根据反馈,修改为使用散列而不依赖于文件顺序:

my %parent;

open my $IN, '<', 'file' or die;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;
  $parent{$id} = [ $parent, $name ];
}

seek $IN, 0, 0;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;

  if ($parent eq '*') {
    print $name;
  } else {
    my @output = ( [ $parent, $name ] );

    while (my $p = $parent{${$output[0]}[0]}) {
      unshift @output, $p;
    }

    print join ',', map { ${$_}[1] } @output;
  }

  print "\n";
}
close $IN;