使用PHP DOM从页面获取链接

时间:2018-10-07 18:21:21

标签: php dom

我正在尝试执行“ Serial Downloader”之类的操作,以使用以下代码将this website中所有季节的所有剧集导入到我的Openload帐户中:

<?php
error_reporting(0);
$serial = file_get_contents($_GET['serial']);
$doc = new DOMDocument();
$doc -> loadHTML($serial);
$xpath = new DOMXPath($doc);
$seasons = $xpath->query("//*[@class='vypisserial']")->item(0);
$serial_divs = $seasons->getElementsByTagName('div');
$x = 0;
foreach($serial_divs as $season){
    $x++;
    echo "Season ".$x."<br />";
    $season_inner = $season->getElementsByTagName('div')->item(0);
    if($season_inner->getAttribute('id')!==""){
        echo "--- START OF SEASON ID '".$season_inner->getAttribute('id')."' ---<br />";
        $season_div = $doc -> getElementByID($season_inner->getAttribute('id'));
        $episode_links = $season_div->getElementsByTagName('a');
        foreach ($episode_links as $episode_link_a) {
            $episode_link = $episode_link_a -> getAttribute("href");
            $c = file_get_contents("https://freeserial.sk".$episode_link);
            $doc = new DOMDocument();
            $doc -> loadHTML($c);
            $frames = $doc -> getElementsByTagName('iframe');
            $link = "https://freeserial.sk".($frames[0] -> getAttribute("src"));
            $video = file_get_contents($link);
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_URL, $link);
            curl_setopt($ch, CURLOPT_HEADER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_exec($ch);
            $url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
            echo "episode_link - ".$url."<br />";
            $c = file_get_contents("https://api.openload.co/1/remotedl/add?login=3&key=C&url=".$url);
        }
        echo "--- END OF SEASON ID '".$season_inner->getAttribute('id')."' ---<br />";
    } else {
        echo "Nothing";
    }
}

当我转到file.php?serial = https://www.freeserial.sk/serial/skam时,我看到了 file.php

仅下载一个季节,而不是四个季节。我不知道怎么了我将不胜感激。谢谢

1 个答案:

答案 0 :(得分:1)

主要问题是您如何尝试阅读文档层次结构,我已对其进行了更改,以使用<div class="itemSeriaVypis">元素作为每个系列的基础,然后使用与此相关的数据。

$serial = file_get_contents($_GET['serial']);
$doc = new DOMDocument();
file_put_contents("season.html", $serial);
$doc -> loadHTML($serial);
$xpath = new DOMXPath($doc);
$serial_divs = $xpath->query("//*[@class='itemSeriaVypis']");
$x = 0;
foreach($serial_divs as $season){
    $x++;
    echo "Season ".$x."<br />";
    echo "--- START OF SEASON ID '".$season->getAttribute('id')."' ---<br />";
    $episode_links = $season->getElementsByTagName('a');
    foreach ($episode_links as $episode_link_a) {
        $episode_link = $episode_link_a -> getAttribute("href");
        $c = file_get_contents("https://freeserial.sk".$episode_link);
        $doc = new DOMDocument();
        $doc -> loadHTML($c);
        $frames = $doc -> getElementsByTagName('iframe');
        $link = "https://freeserial.sk".($frames[0] -> getAttribute("src"));
        $video = file_get_contents($link);
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $link);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_exec($ch);
        $url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
        echo "episode_link - ".$url."<br />";
        $c = file_get_contents("https://api.openload.co/1/remotedl/add?login=c5b4f1671c8e8323&key=CQkTSjzz&url=".$url);
    }
    echo "--- END OF SEASON ID '".$season->getAttribute('id')."' ---<br />";

}