如何在WWW :: Mechanize中获取后续链接的内容?

时间:2010-07-07 18:53:57

标签: perl www-mechanize downloading

这是我希望的最后一个问题。我正在使用$ mech-> follow_link尝试下载文件。出于某种原因,尽管保存的文件只是我首先提取的页面而不是我想要关注的链接。这是我从链接下载文件的正确方法吗?我不想使用wget。

    #!/usr/bin/perl -w
    use strict;
    use LWP;
    use WWW::Mechanize;
    my $now_string = localtime;
    my $mech = WWW::Mechanize->new();
    my $filename = join(' ', split(/\W++/, $now_string, -1));
    $mech->credentials( '***********' , '************'); # if you do need to supply     server and realms use credentials like in [LWP doc][2]
$mech->get('http://datawww2.wxc.com/kml/echo/MESH_Max_180min/') or die "Error: failed to load the web page";
$mech->follow_link( url_regex => qr/MESH/i ) or die "Error: failed to download content";
$mech->save_content("$filename.kmz");

3 个答案:

答案 0 :(得分:3)

尝试

的步骤
  1. 首先打印get中的内容,以确保您获得的是有效的HTML页面
  2. 确保您要访问的链接是名为“MESH”的第三个链接(区分大小写?)
  3. 打印第二个get
  4. 中的内容
  5. 打印文件名以确保其格式正确
  6. 检查文件是否已成功创建
  7. 其他

    • 你不需要除非在任何一种情况下 - 它会起作用,否则就会死亡

    实施例

    #!/usr/bin/perl -w
    
    use strict;
    use WWW::Mechanize;
    
       sub main{
    
          my $url    =  qq(http://www.kmzlinks.com);
          my $dest   =  qq($ENV{HOME}/Desktop/destfile.kmz);
    
          my $mech   =  WWW::Mechanize->new(autocheck => 1);
    
          # if needed, pass your credentials before this call
          $mech->get($url);
          die "Couldn't fetch page" unless $mech->success;
    
          # find all the links that have urls to kmz files
          my @links  =  $mech->find_all_links( url_regex => qr/(?:\.|%2E)kmz$/i );
    
          foreach my $link (@links){               # (loop example)
    
             # use absolute URL path of the link to download file to destination
             $mech->get($link->url_abs, ':content_file' => $dest);
    
             last;                                 # only need one (for testing)
          }     
       }
    
       main();
    

答案 1 :(得分:1)

您确定要第3个链接名为'MESH'吗?

答案 2 :(得分:-1)

if更改为unless

相关问题