HTML分页解析与PHP简单的HTML DOM解析器

时间:2018-02-19 08:24:04

标签: php html parsing dom

我试图用分页来解析电影网站。我想解析第1页上的所有电影项目,当它完成时我希望解析器继续下一页。我编写了一个可以工作的解析器,但它不解析页面上的所有电影项目,也不会在另一个页面上继续。我想检测何时完成一个结果的解析并使其在下一个项目上移动。然后检测何时解析所有电影项目并使其在下一页上移动。我希望当我运行解析器时,它应该逐个显示电影标题,年份等,然后继续下一页。目前它仅显示/解析第1页上的一个电影项目,并且不继续工作。这是我的代码和示例:

解析示例:http://minerbitco.in/parse/parse.php

    <?php

    include_once 'simple_html_dom.php';



    $page = (!isset($_GET['page'])) ? 1 : $_GET['page'];
        echo '<br> Parsing Page #'.$page.'<br><br>';
        $html = file_get_html('https://srulad.com/movies/type/movie#page-'.$page);
        $obj = $html->find('div.movie_item');
        $datas = [];
        if($obj){
            foreach ($obj as $key => $data) {


                $movie_url = 'https://srulad.com/'.$data->find('div.poster a', 0)->href;

                $html2 = file_get_html($movie_url);

                $item['url'] = $movie_url;

                $item['year'] = $html2->find('#movie_content > div', 0)->children(2)->find('div', 0)->children(0)->children(1)->plaintext;

                $item['genre'] =  $html2->find('#movie_content > div', 0)->children(1)->find('span', 0)->plaintext;

                $item['description'] = $html2->find('#movie_content > div', 0)->children(1)->find('div.plot', 0)->plaintext;

                $item['imdb_rating'] = $html2->find('#movie_content > div', 0)->children(2)->find('div', 0)->children(1)->children(1)->find('span', 0)->plaintext;

                $item['englishtitle'] = $html2->find('#movie_content > div', 0)->children(1)->find('h2.newmt', 0)->plaintext;

                $item['geotitle'] = $html2->find('#movie_content > div', 0)->children(1)->find('h3.newmt', 0)->plaintext;

                $item['poster'] = $html2->find('#movie_content > div', 0)->children(0)->find('img', 0)->src;



                $url = $item['url'];
                $year = $item['year'];
                $desc = $item['description'];
                $rating = $item['imdb_rating'];
                $poster = $item['poster'];
                $engtitle = $item['englishtitle'];
                $geotitle = $item['geotitle'];
                $genre = $item['genre'];
    }}

if ($data === end($obj)) {
    echo '<META http-equiv="refresh" content="10;URL=#page-'.($page+1).'">';
}

else {
    echo "dasrulebulia.";
}

    echo 'URL: '.$url.'<br>';
    echo 'პოსტერის URL: '.$poster.'<br>';
    echo 'სათაური ინგლისურად: '.$engtitle.'<br>';
    echo 'სათაური ქართულად: '.$geotitle.'<br>';
    echo 'წელი:'.$year.'<br>';
    echo 'ჟანრი:'.$genre.'<br>';
    echo 'აღწერა:'.$desc.'<br>';
    echo 'რეიტინგი:'.$rating.'<br>';
?>

1 个答案:

答案 0 :(得分:0)

你可以尝试一下我写过的Parser:

https://github.com/sachinsinghshekhawat/simple-html-dom-parser-php