数据捕获脚本问题

时间:2014-10-21 16:01:24

标签: perl cgi

我似乎无法使用WWW::Mechanize来使用此脚本。

我知道这可能很简单,但我看不到它。

我认为由于某种原因它在HTML::TokeParser失败了。

我收到此错误消息

Can't call method "get_token" on an undefined value at Untitled line 13

#!/usr/bin/perl

print "Content-type: text/html\n\n";
use WWW::Mechanize;

my $url = "http://slashdot.org/";

my $agent = WWW::Mechanize->new( autocheck => 1 );
$agent->get($url);

my $stream = HTML::TokeParser->new( $agent->{content} );

while ( my $token = $stream->get_token ) {
    my $ttype = shift @{$token};

    if ( $ttype eq "S" ) {
        my ( $tag, $attr, $attrseq, $rawtxt ) = @{$token};

        if ( $tag eq "div" ) {
            if ( $rawtxt =~ /id="text-/m ) {
                print $stream->get_trimmed_text( $tag, "/div" );
                print "\n\n\n\n";
            }
        }
    }
}

1 个答案:

答案 0 :(得分:0)

来自HTML::TokeParser的文档:

  

$p = HTML::TokeParser->new( \$document, %opt );

     

The object constructor argument is either a file name, a file handle object, or the complete document to be parsed. Extra options can be provided as key/value pairs and are processed as documented by the base classes.

     

If the argument is a plain scalar, then it is taken as the name of a file to be opened and parsed. If the file can't be opened for reading, then the constructor will return undef and $! will tell you why it failed.

从你的剧本:

  

Can't call method "get_token" on an undefined value at Untitled line 13

检查您传递的参数以初始化HTML :: TokeParser对象:

my $stream = HTML::TokeParser->new($agent->{content});

首先,您应该使用WWW::Mechanize's content方法来获取页面内容,其次,您需要传入对内容的引用,而不是内容本身。要更正代码,您需要

my $stream = HTML::TokeParser->new( \$agent->content );

您可能还想添加错误检查,以确保在启动解析器之前成功检索slashdot页面(例如,使用$agent->success)。