Question

我有一个文件，其中包含n个子例程定义，如下所示（每个例程以sub开头，以关键字bar结尾）：

sub foo()
contents
bar

sub good ()
contents
bar

sub right ()
contents
bar

我想创建三个名为foo.s，good.s和right.s的文件，并将相应的子程序写入相应的文件中。我尝试了以下脚本，但是对全局/局部变量进行了不正确的管理，因为我对Perl脚本完全不熟悉。我怎样才能做到这一点？

      my $op = 0;

      my $fnamet = $ARGV[0];
      if(not defined $fnamet) {
              die "Top file name not known\n";
      }
      my $srcfile = "source/$fnamet.s";
      print "$srcfile\n";
      open (SRC, $srcfile) or die "Couldn't find source file\n";

      foreach $line (<SRC>) {
              if ($op == 0) {
                      strtwrite($line, $op);
              } elsif ($op == 1) {
                      stopwrite($line, $op);
              }
      }
      if ($op == 1) {
              print "Some sub is missing bar statement or having\n";
              print "additional sub statement";
              close (DST);
      }
      close (SRC);
  }

  sub strtwrite {
      unless ($_[0] =~ /^\s*sub\s*.*/) {
              print "Searching for a sub start\n";
              $op = 0;
      } else {
              print "A sub start found\n";
              print "$_[0]\n";
              my @temp = (split /[\s,();]+/, $_[0]);
              my $strt = '';
              if($temp[1] =~ 'sub') {
                      print "Comparing with sub pass \n";
                      $strt = $temp[2];
              } else {
                      print "Comparing with sub fail \n";
                      $strt = $temp[1];
              }
              my $fnameb = "dstcodes/$strt.s";
             print "\n\n $fnameb \n\n";
              open (DST, '>$fnameb') or die "Couldn't open sub file\n";
              print DST $_[0];
              $op = 1;
      } 
  }

  sub stopwrite {
      unless ($_[0] =~ /^\s*bar\s*.*/) {
              # Copy till the bar is found
              print "Searching for an bar\n";
              print DST $_[0];
              $op = 1;
      } else {
              # Close the current destination file and start waiting 
              # for next SUB start
              print "A matching BAR found\n";  
              print DST $_[0];
              close (DST);
              $op = 0;
      }
  }

Answer 1

对于输出文件句柄使用词法变量这是相对简单的（无论如何最好这样做）。

只要在输入中找到$dst行，此程序就会在sub上打开输出文件。然后将当前输入行打印到$dst，一切正常。

一些提示：

始终 use strict和use warnings
声明您的变量尽可能接近他们的第一个使用点，而不是一起在程序的顶部
始终使用词法文件句柄和open的三参数形式，并在$!字符串中包含die的值，以便你知道失败的原因
如果您不在die字符串的末尾添加换行符，则Perl将显示问题发生的文件名和行号
使用while来读取文件，而不是for。后者将在循环开始之前不必要地将整个文件读入内存，这将成为大文件的问题

use strict;
use warnings;

die "Top file name not known" unless @ARGV;
my $srcfile = "source/$ARGV[0].s";

open my $src, $srcfile or die "Unable to open '$srcfile': $!";
my $dst;
while (<$src>) {
  if (/^\s*sub\s+(\w+)/) {
    my $file = "dstcodes/$1.s";
    open $dst, '>', $file or die "Unable to open '$file' for output: $!";
  }
  print $dst $_ if $dst;
}

close $dst if $dst;

Answer 2

您可以尝试while循环使用/g正则表达式

use strict;
use warnings;

my ($fnamet) = @ARGV;
open my $fh, "<", $fnamet or die $!;
my $str = do { local $/; <$fh> };
close $fh or die $!;

while ($str =~ /(sub \s+ (\w+) .+?) bar/xgs) {
  my ($cont, $name) = ($1, $2);

  open my $o, ">", "$name.s" or die $!;
  print $o $cont;
  close $o or die $!;
}

Answer 3

首先，您可以快速解决问题：我们将$op变量的引用传递给subs。 Perl引用类似于C指针。我们可以使用\运算符获取引用。我们可以取消引用像$$opref这样的引用，即sigil $也可以作为解引用运算符。

if ($op == 0) {
        strtwrite($line, \$op);
} elsif ($op == 1) {
        stopwrite($line, \$op);
}

然后在subs中，我们解压缩参数：

sub stopwrite {
  my ($line, $opref) = @_;

  ...
  $$opref = 1;
}

（跳过引用并分配给$_[1]的解决方案较短，但这并不完全可读）。

但是你知道，我有点反对这种技术，因为所有这些都是远距离行动 -ish和可变状态往往使简单的事情变得相当复杂。 / p>

假设我们有一个子程序extract_sub，它跳过任何垃圾行，直到它找到一个子声明，将子句提取到一个文件，并在子{{1}终止后返回标记。为此，bar将文件句柄作为参数。所以我们的主要部分如下：

extract_sub

现在，use strict; use warnings; use autodie; # automatic error messages for `open` @ARGV or die "Usage: $0 input-file\n"; my $filename = shift @ARGV; # using “lexical filehandes” and an explicit open mode “<”: open my $source, "<", $filename; extract_subs($source) until eof $source; # $source is closed automatically会发生什么？首先，我们将参数解压缩到变量中，这更易于阅读：

extract_subs

接下来，我们开始弃用行，直到看到sub extract_subs { my ($input) = @_;：

sub

我们在该循环中打开了文件，以便能够在那里打印my $output; while (my $declaration = <$input>) { if ($declaration =~ /^\s* sub \s+ (\w+)/x) { my $name = $1; open $output, ">", "dstcodes/$name.s"; # no error handling because autodie print { $output } $declaration; # print the whole declaration to this file last; # leave the loop } } # check that the loop didn't abort because $input was exhausted: return unless defined $output;。

既然我们打开了输出文件并且在子文件中，我们将所有行打印到$declaration，直到我们看到终止行：

$output

我认为这段代码更优雅，更易于理解。以下是您不应该做的一些事情：

# implicitly read into $_ default variable while (<$input>) { print { $output } $_; return if /^\s*bar\b/; # exit the whole subroutine, not just the loop } # If this code is reached, then $input was exhausted before finding the "bar" terminator die "A sub was not terminated with a bar statement"的古代形式，如裸字文件句柄，或指定无打开模式。始终使用三参数形式：open - 使用open my $filehandle, "<", $filename or die "Can't open $filename: $!"时可省略or die部分。
使用全局变量，如bareword文件句柄。他们使代码变得越来越难以理解。
未使用autodie和strict。所有这些错误消息看起来都很麻烦，但它们通常指向真正的问题应该修复而不是忽略。
使用标志指定解析器的状态。如果将代码分解为具有良好返回值的适当的子例程，则不需要此类通信通道。请记住，如有必要，您可以从Perl子例程返回多个值。
使用warnings作为常规字符串运算符。它用于正则表达式匹配。如果要测试字符串相等，请使用=~运算符。
测试特定的真值。如果您只关心变量是真还是假，那么直接在条件中使用它：eq。这比需要特定的值更强大，例如if ($foo) { bar() } else { baz() }。
使用if ($foo == 1) { bar() } elsif ($foo = 0) { baz() }构造。这通常很难理解。 unless ($cond) { A } else { B }或if (not $cond) { A} else { B }中的任何一个都更好。
不读“现代Perl”。一旦您对编写Perl感到满意，您应该阅读该书（它也可以在线获取）以了解当前的最佳实践。

创建和复制单个子例程以从具有n个子例程定义的文件中分离文件

3 个答案: