Question

我似乎无法使用WWW::Mechanize来使用此脚本。

我知道这可能很简单，但我看不到它。

我认为由于某种原因它在HTML::TokeParser失败了。

我收到此错误消息

Can't call method "get_token" on an undefined value at Untitled line 13

#!/usr/bin/perl

print "Content-type: text/html\n\n";
use WWW::Mechanize;

my $url = "http://slashdot.org/";

my $agent = WWW::Mechanize->new( autocheck => 1 );
$agent->get($url);

my $stream = HTML::TokeParser->new( $agent->{content} );

while ( my $token = $stream->get_token ) {
    my $ttype = shift @{$token};

    if ( $ttype eq "S" ) {
        my ( $tag, $attr, $attrseq, $rawtxt ) = @{$token};

        if ( $tag eq "div" ) {
            if ( $rawtxt =~ /id="text-/m ) {
                print $stream->get_trimmed_text( $tag, "/div" );
                print "\n\n\n\n";
            }
        }
    }
}

Answer 1

来自HTML::TokeParser的文档：

$p = HTML::TokeParser->new( \$document, %opt );

The object constructor argument is either a file name, a file handle object, or the complete document to be parsed. Extra options can be provided as key/value pairs and are processed as documented by the base classes.

If the argument is a plain scalar, then it is taken as the name of a file to be opened and parsed. If the file can't be opened for reading, then the constructor will return undef and $! will tell you why it failed.

从你的剧本：

Can't call method "get_token" on an undefined value at Untitled line 13

检查您传递的参数以初始化HTML :: TokeParser对象：

my $stream = HTML::TokeParser->new($agent->{content});

首先，您应该使用WWW::Mechanize's content方法来获取页面内容，其次，您需要传入对内容的引用，而不是内容本身。要更正代码，您需要

my $stream = HTML::TokeParser->new( \$agent->content );

您可能还想添加错误检查，以确保在启动解析器之前成功检索slashdot页面（例如，使用$agent->success）。

数据捕获脚本问题

1 个答案: