如何使用perl中的regex从匹配字符串中提取句子?

时间:2017-04-12 03:24:09

标签: regex perl

我尝试返回匹配字符串的句子。长话短说,我正在使用WWW::Wikipedia模块在​​维基百科网站中搜索关键字并返回包含这些关键字的页面,然后搜索返回页面并尝试查找匹配句子。 例如:如果返回如下所示的文本

'Machine learning' is the subfield of computer science that, according to
Arthur Samuel in 1959, gives "computers the ability to learn without being
explicitly
programmed."[https://www.cims.nyu.edu/~munoz/files/ml_optimization.pdf Machine
Learning and Optimization] Evolved from the study of pattern recognition and
computational learning theory in artificial intelligence,http://www.britannica.com/EBchecked/topic/1116194/machine-
learning machine learning explores the study and construction of algorithms
that can learn from and make predictions on data – such algorithms overcome
following strictly static program instructions by making data-driven
predictions or decisions, through building a model from sample inputs. Machine
learning is employed in a range of computing tasks where designing and
programming explicit algorithms with good performance is difficult or
unfeasible example applications include email filtering, detection of network
intruders or malicious insiders working towards a data breach, optical
character recognition OCR,Wernick, Yang, Brankov, Yourganov and Strother,
Machine Learning in Medical Imaging, [[IEEE Signal Processing Society|IEEE
Signal Processing Magazine]], vol. 27, no. 4, July 2010, pp. 25–38 

我希望找到包含machine learning is的句子 所以我使用正则表达式在perl中创建这个简单的子程序。

sub findSubject {
   my ($question) = @_ ;
   my $search;
   my @words;
   my $tobe;
   my $tail="";
   my @verbs =('what','who','when','where');
   $question =~ s/\s*[.?!]$//gi;
   $question =~ s/(^\s+|\s+)$//g;
    if ($question =~ m/(who|what)/i){
         $question =~ s/(who|what)//i;
         @words=split /\s+/,$question;
         shift @words ;
         $tobe = shift @words ;
         $search = join " ",@words;
         return ($search,$tobe,$tail);
     } elsif ($question =~ m/(when|where)/i)
    {
        $question =~ s/(where|when)//i;
        @words=split /\s+/,$question;
        shift @words ;
        $tail = pop @words;
        $tobe = shift @words;
        $search = join " ",@words;
        return ($search,$tobe,$tail);
    }
    else {
         return undef;
    }

}

sub findAnswer{
    my ($result,$search,$tobe,$tail)=@_;

    my $outFormat;
    if ($tail eq ""){
        $result =~s/[\)\(\}\{\;]//gi;
        print "$result\n";
        my $out = $result=~ m/\b\'?$search\'?\b\s*\b$tobe\b\s*(.*)\./gi;
        print "$out\n";
        $outFormat = "$search $tobe $out";
        return $outFormat ;
    }else {
        return "golden";
    }
}
sub main{
   my $f2;
   my @inputs= @ARGV;
   my $searchInput = 'here';
   my $searchOutput;
   my $tobe;
   my $tail;
   openingStatement();
   open($f2, ">:utf8", $inputs[0]);
    while(my $q = <STDIN>) {
        my $wiki = WWW::Wikipedia->new( clean_html => 1 );
        chomp $q;
        $q = lc $q;

        if ($q eq 'exit'){
            exit;
        }
        else {
            ($searchInput,$tobe,$tail) = findSubject($q);
            if ($searchInput){
                my $result = $wiki->search( $searchInput );
                if ( $result ) {
                   my $res = $result->fulltext();
                   if ($res){
                       $searchOutput = findAnswer($res,$searchInput,$tobe,$tail);
                       binmode(STDOUT, ":utf8");
                       print "$searchOutput\n" ;
                       print $f2  "$searchOutput\n";
                   }

                }else {
                    print "i dont know the answer \n";
                }
            }else {
               print "program accept only questions type {who, what, when, where}\n";
            }

        }
    }

   close $f2;
}

其中$result是我要解析的prograph,而$search$tobe是我要查找的字词,在这种情况下是machine learning is

但我得到了这个奇怪的错误:

Use of uninitialized value $_ in pattern match (m//) at qa-system.pl line 52,
    <STDIN> line 1 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.

    To help you figure out what was undefined, perl will try to tell you
    the name of the variable (if any) that was undefined.  In some cases
    it cannot do this, so it also tells you what operation you used the
    undefined value in.  Note, however, that perl optimizes your program
    anid the operation displayed in the warning may not necessarily appear
    literally in your program.  For example, "that $foo" is usually
    optimized into "that " . $foo, and the warning will refer to the
    concatenation (.) operator, even though there is no . in
    your program.

和子程序返回这个奇怪的值

machine learning is 18446744073709551615

任何线索为什么我收到此错误。我的正则表达式中有任何错误吗?

1 个答案:

答案 0 :(得分:1)

你有一个错字。

$out = $result= ~ m/\s*\'?$search\'? $tobe (.*)\s*\./i;
              ^^^

=~运算符不能包含空格。使用空格分配(=)和按位否定(~)。

相反,$result = ~ m/.../表示&#34;设置$ result等于$_ =~ m/.../结果的按位否定。换句话说,胡说八道。 (如果没有提供变量,许多Perl运算符和内置函数都使用$_。)

相关问题