Question

我正在尝试阅读两个关键字之间的文本。虽然不是很有效。我要阅读的是问答，然后将其打印出来。它不起作用，只是继续打印出一个很大的循环。

#!/usr/bin/perl
use strict ;
use warnings;
my $question ;
my $answer ;

while(my $line = <>){
chomp $line ;

if ($line =~ /questionstart(.*)questionend/) {
    $question = $1 ; }
elsif ($line  =~ /answerstart(.*)answerend/) {
    $answer = $1 ; }

my $flashblock = <<"FLASH" ;
<!-- BEGIN -->
<p class="question">
  $question
</p>
<p class="answer">
   $answer
</p>
<!-- END -->
FLASH
print $flashblock ;
}

这是文件示例

questionstart

hellphellohellohello


questionend

answerstart

hellohellohello

answerend

Answer 1

由于文件是逐行读取的，因此搜索语句跨越了多行，因此无法匹配。

解决此问题的一种基本方法是为问题和答案区域设置标志。由于您有非常清晰的标记用于进入和离开这些区域，因此代码非常简单

use warnings;
use strict;
use feature 'say';

my ($question, $answer);
my ($in_Q, $in_A);

while (my $line = <>) {
    next if $line =~ /^\s*$/;

    if    ($line =~ /^\s*questionstart/) { $in_Q = 1; next }   
    elsif ($line =~ /^\s*questionend/)   { $in_Q = 0; next }   
    elsif ($line =~ /^\s*answerstart/)   { $in_A = 1; next }   
    elsif ($line =~ /^\s*answerend/)     { $in_A = 0; next }       

    if    ($in_Q) { $question .= $line }
    elsif ($in_A) { $answer   .= $line }
}

say "Question: $question";
say "Answer: $answer";

（这里，为了简洁和强调，我压缩了if-elsif语句）

此代码对输入文件进行了一些合理的假设。我要求标记以该行开头（可能有空格），但允许在其后添加更多文本。如果要确保它们是行中唯一的内容，请在正则表达式的末尾添加$锚点（同样与\s*相同）。

说明输入具有一个Q / A。如果将其更改为多个，则将打印内容移入循环中，一旦答案结束就进入elsif (/^\s*answerend/) { .. }

问题中的打印很好，所以在此不再重复。如果有机会打印HTML以外的其他格式，请从开头和结尾的空格，多个空格和换行符中清除生成的字符串。

对同一变量的重复测试可能会导致人们寻求一种案例类型的构造，在Perl中将是switch。但是，这仍然是实验性功能，其运行方式

很难准确描述

（文档！）。此外，它还可能涉及到智能匹配，这很难描述，被广泛理解为以其当前形式被破坏，并且肯定会被更改。因此，我建议使用级联的if-elsif语句（在这种方法中）。

Answer 2

正如其他人指出的那样，当您一次读取一行输入文件时，多行正则表达式将永远无法工作。

这是Perl“触发器”操作符（..）的完美用法。

#!/usr/bin/perl

use strict;
use warnings;

my ($question, $answer);

while (<DATA>) {
  if (/questionstart/ .. /questionend/ and ! /question(start|end)/) {
    $question .= $_;
  }

  if (/answerstart/ .. /answerend/ and ! /answer(start|end)/) {
    $answer .= $_;
  }

  # If we're at the end of an answer, do all the stuff
  if (/answerend/) {
    q_and_a($question, $answer);

    # reset text variables
    $question = $answer = '';
  }
}

sub q_and_a {
  my ($q, $a) = @_;

  print <<"FLASH";
<!-- BEGIN -->
<p class="question">
  $question
</p>
<p class="answer">
   $answer
</p>
<!-- END -->
FLASH
}

__DATA__
questionstart

hellphellohellohello


questionend

answerstart

hellohellohello

answerend

更新：将显示移动到子例程中以使主循环更干净。

Answer 3

您的方法涉及逐行读取文件，但是您的正则表达式试图在问题/答案的开始和结束之间抓取多行。文件中没有行会匹配这样的多行正则表达式，最终您将得到未初始化的$question和$answer变量以及为文件中的每一行打印的阻止/警告提示。

将整个文本文件读取为一个字符串，然后将其拆分为问题/答案块并修剪内容（如果需要）是很有意义的：

#!/usr/bin/perl
use strict;
use warnings;

open my $fh, '<', 'file.txt' or die "Can't open file $!";
my @qa = grep(/\w+/g, split /^(questionstart|answerstart|questionend|answerend)$/mg, do {local $/; <$fh>});
s/^\s+|\s+$//g foreach @qa;

my $flashblock = << "FLASH";
<!-- BEGIN -->
<p class="question">
    $qa[0]
</p>
<p class="answer">
    $qa[1]
</p>
<!-- END -->
FLASH

print $flashblock;

输出：

<!-- BEGIN -->
<p class="question">
    hellphellohellohello
</p>
<p class="answer">
    hellohellohello
</p>
<!-- END -->

如果一个文件中有多个问题/答案对，则可以遍历@qa数组并打印对，或将其放入哈希表中并根据需要使用。

Perl在两个关键字之间抓取文本

3 个答案: