Question

我在编写Perl脚本以读取二进制文件时遇到问题。

我的代码如下，$file是二进制格式的文件。我试图通过网络搜索并在我的代码中应用，试图将其打印出来，但似乎它不能很好地工作。

目前它只打印＆amp;＆amp;＆amp;＆amp;＆amp;＆amp; amp;＆amp;＆amp;＆amp;＆amp;＆＃34;和＆＃34;＆＃34; ppppppppppp＆＃34;，但我真正想要的是它可以打印出每个$line，以便我可以稍后进行其他一些后期处理。另外，我不太确定$data是什么，因为我认为它是文章中示例代码的一部分，表示假设是标量。我需要一个可以指出我代码中错误出错的人。以下是我的所作所为。

my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir(TEMP1);
closedir(TEMP1);

foreach my $dirs (@dirs) {
    next if ($dirs eq "." || $dirs eq "..");
    print "---->$dirs\n";
    my $d = "$basedir/$key/$dirs";
    if (-d "$d") {
        opendir (TEMP2, $d) || die $!;
        my @files = readdir (TEMP2); # This should read binary files
        closedir (TEMP2);

        #my $buffer = "";
        #opendir (FILE, $d) || die $!;
        #binmode (FILE);
        #my @files =  readdir (FILE, $buffer, 169108570);
        #closedir (FILE);

        foreach my $file (@files) {
            next if ($file eq "." || $file eq "..");
            my $f = "$d/$file";
            print "==>$file\n";
            open FILE, $file || die $!;
            binmode FILE;
            foreach ($line = read (FILE, $data, 169108570)) {
                print "&&&&&&&&&&&$line\n";
                print "ppppppppppp$data\n";
            }
            close FILE;
        }
    }
}

我修改了我的代码，使其如下所示。现在我可以阅读$ data了。感谢J-16 SDiZ指出这一点。我试图将我从二进制文件中获取的信息推送到一个名为＆＃34; @ array＆＃34;的数组中，想要从数组中获取数据以获取字符串中的哪一个匹配＆＃34; p04＆＃34;但失败了。有人可以指出错误在哪里吗？

my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir (TEMP1);
closedir (TEMP1);

foreach my $dirs (@dirs) {
    next if ($dirs eq "." || $dirs eq "..");
    print "---->$dirs\n";
    my $d = "$basedir/$key/$dirs";
    if (-d "$d") {
        opendir (TEMP2, $d) || die $!;
        my @files = readdir (TEMP2); #This should read binary files
        closedir (TEMP2);

        foreach my $file (@files) {
            next if ($file eq "." || $file eq "..");
            my $f = "$d/$file";
            print "==>$file\n";
            open FILE, $file || die $!;
            binmode FILE;
            foreach ($line = read (FILE, $data, 169108570)) {
                print "&&&&&&&&&&&$line\n";
                print "ppppppppppp$data\n";
                push @array, $data;
            }
            close FILE;
        }
    }
}

foreach $item (@array) {
    #print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
    if ($item =~ /p04(.*)/) {
        print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have.  This is wrong that I aspect it to print the content of each binary file instead :(
        next if ($item !~ /^w+/);
        open (LOG, ">log") or die $!;
        #print LOG $item;
        close LOG;
    }
}

同样，我改变了我的代码如下，但它仍然没有工作，因为它无法grep＆＃34; p04＆＃34;正确检查＆＃34; log＆＃34;文件。它确实grep整个文件包括像这样的二进制文件＆＃34; @ ^ @ ^ @ ^ @ ^ G ^ D ^ @ ^ @ ^ @ ^^ @ p04bbhi06 ^ @ ^^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ HH ^ R ^ @ ^ @ ^ @ ^^ @ ^ @ ^ @ p04lohhj09 ^ @ ^ @ ^ @ @@ ^^＆＃34; 。我唯一的方面是它只用p04 grep psh任何东西，如grepping p04bbhi06和p04lohhj09。以下是我的代码： -

foreach my $file (@files) {
    next if ($file eq "." || $file eq "..");
    my $f = "$d/$file";
    print "==>$file\n";
    open FILE, $f || die $!;
    binmode FILE;
    my @lines = <FILE>;
    close FILE;
    foreach $cell (@lines) {
        if ($cell =~ /b12/) {
            push @array, $cell;
        }
    }
}

#my @matches = grep /p04/, @lines;
#foreach $item (@matches) {
foreach $item (@array) {
    #print "-->$item<--";
    open (LOG, ">log") or die $!;
    print LOG $item;
    close LOG;
}

Answer 1

使用：

$line = read (FILE, $data, 169108570);

数据位于$data; $line是读取的字节数。

       my $f = "$d/$file" ;
       print "==>$file\n" ;
       open FILE, $file || die $! ;

我想完整路径位于$f，但您正在打开$file。（在我的测试中 - 即使$f不是完整的路径，但我想你可能还有其他一些胶水代码......）

如果您只想浏览目录中的所有文件，请尝试File::DirWalk或File::Find。

Answer 2

我不确定我是否理解你。

如果您需要读取二进制文件，则可以执行与文本文件相同的操作：

open F, "/bin/bash";
my $file = do { local $/; <F> };
close F;

在Windows下，您可能需要在* nix下添加binmode F;，无需使用它。

如果您需要查找数组中哪些行包含某些单词，可以使用grep函数：

my @matches = grep /something/, @array_to_grep;

您将获得新数组@matches中的所有匹配行。

BTW：我不认为一次将大量二进制文件读入内存是个好主意。您可以逐个搜索它们......

如果您需要找到 where 匹配，您可以使用其他标准函数index：

my $offset = index('myword', $file);

Answer 3

我不确定我是否能够准确回答OP问题，但这里有一些可能相关的注释。（编辑：这与@Dimanoid的答案相同，但更详细）

假设你有一个文件，它是ASCII数据和二进制文件的混合体。以下是bash终端中的示例：

$ echo -e "aa aa\x00\x0abb bb" | tee tester.txt
aa aa
bb bb
$ du -b tester.txt 
13  tester.txt
$ hexdump -C tester.txt 
00000000  61 61 20 61 61 00 0a 62  62 20 62 62 0a           |aa aa..bb bb.|
0000000d

请注意，字节00（指定为\x00）是一个不可打印的字符（在C中，它还表示＆＃34;字符串的结尾＆＃34; ） - 因此，它的存在使tester.txt成为二进制文件。该文件的大小为du所见的13字节，因为\n添加了尾随echo（从hexdump可以看到）。

现在，让我们看看当我们尝试使用perl <>钻石运算符（另请参阅What's the use of <> in perl?）阅读时会发生什么：

$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'

length is: 7
data is: --aa aa
--

显然，整个文件并没有被玷污 - 它在行尾\n处破裂（而不是在二进制\x00处）。这是因为菱形文件句柄<FH>运算符实际上是readline的快捷方式（请参阅Perl Cookbook: Chapter 8, File Contents）

相同的链接告诉我们应该取消输入记录分隔符\$（默认情况下设置为\n），以便粘贴整个文件。您可能希望此更改仅限于本地，这就是使用大括号和local代替undef的原因（请参阅Perl Idioms Explained - my $string = do { local $/; };）;所以我们有：

$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'

_
_
_
_
length is: 13
data is: --aa aa
bb bb
--

...现在我们可以看到该文件完全被淹没了。

由于二进制数据意味着不可打印的字符，您可能需要通过$data或sprintf / pack进行打印来检查unpack的实际内容。

希望这有助于某人，
干杯！

如何在Perl中读取二进制文件

3 个答案: