将文件夹列表减少到最低公用文件夹

时间:2011-09-27 19:14:51

标签: perl directory

我有一个巨大的文件路径列表,这些路径对于我们的SCM来说太大了。我需要根据最低的公共级别文件夹将它们缩小。例如,给定以下路径:

//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10

基于此,我想到达:

//folder1/folder2
//folder1/folder3
//folderx/foldery

文件夹列表将从文本文件中读取,大约2M行。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

这看起来很适合split()和哈希:

use strict;
use warnings;

my %seen;
foreach my $path ( @paths ) {
  $path =~ s|^//||; # Strip off leading //
  my @elems = split( '/', $path );
  $seen{$elems[0]}{$elems[1]}++;
}

foreach my $rootpath ( sort keys %seen ) {
  foreach my $secondpath ( sort keys %{$seen{$rootpath}} ) {
    print "//" . $rootpath . "/" . $secondpath . "\n";
  }
}

如果您只想打印两次或两次以上的路径,请在next if $seen{$rootpath}{$secondpath} > 1;之前插入print()

我没有对此进行测试,因此可能存在语法错误,但代码给出了一般要点。

答案 1 :(得分:0)

怎么样:

#!/usr/local/bin/perl 
use strict;
use warnings;
use 5.010;

my %out;
while(<DATA>) {
    chomp;
    m#^(//[^/]+/[^/]+)#;
    $out{$1} = 1;
}
say for keys%out;

__DATA__
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10

<强>输出:

//folderx/foldery
//folder1/folder3
//folder1/folder2