更新我的代码以刮取需要cookie的网站

时间:2012-08-12 04:12:53

标签: php cookies scrape

我正在使用此代码来抓取页面名称,网址和图片。

它运作良好,但在某些网站上它失败并返回需要cookie的文本。如何在访问URL /数据时设置cookie或模拟cookie。

    <?php

    $url = $_REQUEST['url'];
    $url = checkValues($url);

    function checkValues($value)
    {
        $value = trim($value);
        if (get_magic_quotes_gpc()) 
        {
            $value = stripslashes($value);
        }
        $value = strtr($value, array_flip(get_html_translation_table(HTML_ENTITIES)));
        $value = strip_tags($value);
        $value = htmlspecialchars($value);
        return $value;
    }   

    function fetch_record($path)
    {
        $file = fopen($path, "r"); 
        if (!$file)
        {
            exit("Problem occured");
        } 
        $data = '';
        while (!feof($file))
        {
            $data .= fgets($file, 1024);
        }
        return $data;
    }

    $string = fetch_record($url);


    /// fecth title
    $title_regex = "/<title>(.+)<\/title>/i";
    preg_match_all($title_regex, $string, $title, PREG_PATTERN_ORDER);
    $url_title = $title[1];

    /// fecth decription
    $tags = get_meta_tags($url);

    // fetch images
    $image_regex = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
    preg_match_all($image_regex, $string, $img, PREG_PATTERN_ORDER);
    $images_array = $img[1];

    ?>

        <div class="images">
        <?php
        $k=1;
        for ($i=0;$i<=sizeof($images_array);$i++)
        {
            if(@$images_array[$i])
            {
                if(@getimagesize(@$images_array[$i]))
                {
                    list($width, $height, $type, $attr) = getimagesize(@$images_array[$i]);
                    if($width >= 50 && $height >= 50 ){

                    echo "<img src='".@$images_array[$i]."' width='100' id='".$k."' >";

                    $k++;

                    }
                }
            }
        }
        ?>
        <!--<img src="ajax.jpg"  alt="" />-->
        <input type="hidden" name="total_images" id="total_images" value="<?php echo --$k?>" />
        </div>
        <div class="info">

            <label class="title">
                <?php  echo @$url_title[0]; ?>
            </label>
            <br clear="all" />
            <label class="url">
                <?php  echo substr($url ,0,35); ?>
            </label>
            <br clear="all" /><br clear="all" />
            <label class="desc">
                <?php  echo @$tags['description']; ?>
            </label>
            <br clear="all" /><br clear="all" />

            <label style="float:left"><img src="prev.png" id="prev" alt="" /><img src="next.png" id="next" alt="" /></label>

            <label class="totalimg">
                Total <?php echo $k?> images
            </label>
            <br clear="all" />

        </div>

0 个答案:

没有答案
相关问题