perl组合了多个文件中的特定列

时间:2014-01-14 22:02:14

标签: perl

我想创建一个结合多个文件列的perl脚本。我必须尊重一系列标准(文件夹/文件结构)。我会尝试代表我拥有的和我拥有的东西。我有两个文件夹和一堆文件。每个文件夹中的文件具有相同的名称。

Folder1:File1,File2,File3,...

Folder2:File1,File2,File3,...

Folder1:File1内容如下所示(制表符分隔):

aaaaa 233
bbbbb 34
ccccc 853
...

除了数值不同之外,所有其他文件看起来都是这样的。我想创建一个如下所示的单个文件(报告):

aaaaa value_Folder1:File1 value_Folder2:File1 value_Folder1:File2 value_Folder2:File2 ...

...

将文件名放在值来自的列之上(只是文件名,文件夹并不重要)会很好。

我有一些代码正在发展,但它现在没有做我想要的!我尝试通过循环使其工作,但我觉得它可能不是解决方案...另一个问题是我不知道如何将列添加到我的报告文件中。在下面的代码中,我只是将值附加到文件的末尾。即使它不是超级好,这是我的代码:

#!/usr/bin/perl -w

use strict;
use warnings;

my $outputfile = "/home/duceppemo/Desktop/count.all.txt";

my $queryDir = "/home/duceppemo/Desktop/query_count/";
my $hitDir = "/home/duceppemo/Desktop/hit_count/";

opendir (DIR, "$queryDir") or die "Error opening $queryDir: $!"; #Open the directory containing the files with sequences to look for
my @queryFileNames = readdir (DIR);

opendir (DIR, "$hitDir") or die "Error opening $hitDir: $!"; #Open the directory containing the files with sequences to look for
my @hitFileNames = readdir (DIR);

my $index = 0;
$index ++ until $queryFileNames[$index] eq ".";
splice(@queryFileNames, $index, 1);

$index = 0;
$index ++ until $queryFileNames[$index] eq "..";
splice(@queryFileNames, $index, 1);

$index = 0;
$index ++ until $hitFileNames[$index] eq ".";
splice(@hitFileNames, $index, 1);

$index = 0;
$index ++ until $hitFileNames[$index] eq "..";
splice(@hitFileNames, $index, 1);

#counter for query file number opened
my $i = 0;

foreach my $queryFile (@queryFileNames) #adjust the file name according to the subdirectory
{
    $i += 1; #keep track of the file number opened

    $queryFile = $queryDir . $queryFile;
    open (QUERY, "$queryFile") or die "Error opening $queryFile: $!";
    my @query = <QUERY>; #Put the query sequences from the count file into an array
    close (QUERY);

    my $line = 0;

    open (RESULT, ">>$outputfile") or die "Error opening $outputfile: $!";

    foreach my $lineQuery (@query) #look into the query file
    {
        my @columns = split(/\s+/, $lineQuery); #Split each line into a new array, when it meets a whitespace character (including tab)

        if ($i == 1)
        {
            #open (RESULT, ">>$outputfile") or die "Error opening $outputfile: $!";
            print RESULT "$columns[0]\t";
            print RESULT "$columns[1]\n";
            #close (RESULT);
            $line += 1;
        }
        else
        {

            open (RESULT, ">>$outputfile") or die "Error opening $outputfile: $!";
            print RESULT "$columns[1]\n";
            close (RESULT);
            $line += 1;
        }
    }
    $line = 0;
}
close (RESULT);
closedir (DIR);

P.S。关于代码优化的任何其他建议都要感激不尽!

2 个答案:

答案 0 :(得分:1)

主要问题是你似乎不明白什么是FILEHANDLE。你应该对此进行研究。

Filehandle是对打开文件的一种引用,因为一切都是文件,所以它可以是命令或目录。

当您制作opendir(DIR,...)时,“DIR”不是关键字,而是可以具有任何名称的文件句柄。这意味着你的2 opendir()具有相同的文件句柄,这没有意义。

应该更像是:

opendir(QDIR, $queryDir) or die "Error opening $queryDir: $!";
my @queryFileNames = readdir(QDIR);

opendir(HDIR, $hitDir) or die "Error opening $hitDir: $!";
my @hitFileNames = readdir(HDIR);

此外,由于您应该始终关闭每个打开的文件句柄,因此必须在同一级别调用close()并确保调用close()。

e.g。文件句柄RESULT的打开及其在打开循环后的关闭没有意义...你打开它多少次而不关闭它?

您可能需要在循环之前打开它,而不必使用相同的文件句柄打开它两次......

通常,您希望避免打开/关闭循环。您只需在之前和之后打开。

答案 1 :(得分:0)

该代码正在做我想要的事情:

#!/usr/bin/perl

use strict;
use warnings;


#my $queryDir = "ARGV[0]";
my $queryDir = "C:/Users/Marco/Desktop/query_count/";
opendir (DIR1, "$queryDir") or die "Error opening $queryDir: $!"; #Open the directory containing the files with sequences to look for
my @queryFileName = readdir (DIR1);

#my $hitDir = "ARGV[1]";
my $hitDir = "C:/Users/Marco/Desktop/hit_count/";
opendir (DIR2, "$hitDir") or die "Error opening $hitDir: $!"; #Open the directory containing the files with sequences to look for
my @hitFileName = readdir (DIR2);

my $index = 0;
$index ++ until $queryFileName[$index] eq ".";
splice(@queryFileName, $index, 1);

$index = 0;
$index ++ until $queryFileName[$index] eq "..";
splice(@queryFileName, $index, 1);

$index = 0;
$index ++ until $hitFileName[$index] eq ".";
splice(@hitFileName, $index, 1);

$index = 0;
$index ++ until $hitFileName[$index] eq "..";
splice(@hitFileName, $index, 1);

foreach my $queryFile (@queryFileName) #adjust the queryFileName according to the subdirectory
{
    $queryFile = "$queryDir" . $queryFile;
}

foreach my $hitFile (@hitFileName) #adjust the queryFileName according to the subdirectory
{
    $hitFile = "$hitDir" . $hitFile;
}

my $outputfile = "C:/Users/Marco/Desktop/out.txt";
my %hash;

foreach my $queryFile (@queryFileName)
{
    my $i = 0;
    open (QUERY, "$queryFile") or die "Error opening $queryFile: $!";
    while (<QUERY>)
    {
        chomp;
        my $val = (split /\t/)[1];
        $i++;
        $hash{$i}{$queryFile} = $val;
    }
    close (QUERY);
}

foreach my $hitFile (@hitFileName)
{
    my $i = 0;
    open (HIT, "$hitFile") or die "Error opening $hitFile: $!";
    while (<HIT>)
    {
        chomp;
        my $val = (split /\t/)[1];
        $i++;
        $hash{$i}{$hitFile} = $val;
    }
    close (HIT);
}

open (RESULT, ">>$outputfile") or die "Error opening $outputfile: $!";

foreach my $qfile (@queryFileName)
{
    print RESULT "\t$qfile";
}

foreach my $hfile (@hitFileName)
{
    print RESULT "\t$hfile";
}

print RESULT "\n";

foreach my $id (sort keys %hash)
{
    print RESULT "$id\t";
    print RESULT "$hash{$id}{$_}\t" foreach (@queryFileName, @hitFileName);
    print RESULT "\n";
}

close (RESULT);