Question

我在一个类中有一个简单的下载功能，可能一次处理来自Amazon Web Services存储桶的数百兆字节的文件。整个文件无法一次加载到内存中，因此必须直接将其传输到文件指针。这是我的理解，因为这是我第一次处理这个问题，而且随着时间的推移，我正在努力。

我最终得到了这个，基于一个4 KB的文件缓冲区，简单的测试表明它的大小很好：

        $fs = fsockopen($host, 80, $errno, $errstr, 30);

        if (!$fs) {
          $this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
        } else {
          $out = "GET $file HTTP/1.1\r\n";
          $out .= "Host: $host\r\n";
          $out .= "Connection: Close\r\n\r\n";
          fwrite($fs, $out);

          $fm = fopen ($temp_file_name, "w");
          stream_set_timeout($fs, 30);

          while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); // ignore headers

          while(!feof($fs)) {
            $contents = fgets($fs, 4096); 
            fwrite($fm, $contents);
            $info = stream_get_meta_data($fs);
            if ($info['timed_out']) {
              break;
            }
          }
          fclose($fm);
          fclose($fs);

          if ($info['timed_out']) {
            // Delete temp file if fails
            unlink($temp_file_name);
            $this->writeDebugInfo("FAILED - Connection timed out: ", $temp_file_name);
          } else {
            // Move temp file if succeeds
            $media_file_name = str_replace('temp/', 'media/', $temp_file_name);
            rename($temp_file_name, $media_file_name);
            $this->writeDebugInfo("SUCCESS: ", $media_file_name);
          }
        }

在测试中没问题。但是，我与一个说我不理解fgets()和feof()如何一起工作的人进行了对话，并且他提到了分块编码作为一种更有效的方法。

代码一般是否正常，或者我错过了一些重要的东西？分块编码会给我带来什么好处？

Answer 1

您的解决方案对我来说似乎不错，但我有一些意见。

1）不要自己创建HTTP数据包，即不发送HTTP请求。而是使用像CURL这样的东西。这更加简单，并且将支持服务器可能回复的更广泛的响应。另外，可以设置CURL直接写入文件，省去你自己做的事情。

2）如果您正在阅读二进制数据，则使用fgets可能会出现问题。 Fgets读取到一行的末尾，使用二进制数据可能会损坏您的下载。相反，我建议fread（$ fs，4096）;它将处理文本和二进制数据。

2）分块编码是网络服务器以多个块的形式向您发送响应的一种方式。我不认为这对你有用，但是，网络服务器可能支持的更好的编码是gzip编码。这将允许Web服务器动态压缩响应。如果您使用像CURL这样的库，它会告诉服务器它支持gzip，然后自动为您解压缩。

我希望这会有所帮助

Answer 2

请勿处理套接字，优化代码并使用cURL库PHP cURL。像这样：

$url = 'http://'.$host.'/'.$file;
// create a new cURL resource
$fh = fopen ($temp_file_name, "w");
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FILE, $fh); 
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
fclose($fh);

Answer 3

最终结果以防万一。我还将整个事情包装在重试循环中以降低完全失败的下载风险，但它确实增加了资源的使用：

      do {
        $fs = fopen('http://' . $host . $file, "rb");

        if (!$fs) {
          $this->writeDebugInfo("FAILED ", $errstr . '(' . $errno . ')');
        } else {
          $fm = fopen ($temp_file_name, "w");
          stream_set_timeout($fs, 30);

          while(!feof($fs)) {
            $contents = fread($fs, 4096); // Buffered download
            fwrite($fm, $contents);
            $info = stream_get_meta_data($fs);
            if ($info['timed_out']) {
              break;
            }
          }
          fclose($fm);
          fclose($fs);

          if ($info['timed_out']) {
            // Delete temp file if fails
            unlink($temp_file_name);
            $this->writeDebugInfo("FAILED on attempt " . $download_attempt . " - Connection timed out: ", $temp_file_name);
            $download_attempt++;
            if ($download_attempt < 5) {
              $this->writeDebugInfo("RETRYING: ", $temp_file_name);
            }
          } else {
            // Move temp file if succeeds
            $media_file_name = str_replace('temp/', 'media/', $temp_file_name);
            rename($temp_file_name, $media_file_name);
            $this->newDownload = true;
            $this->writeDebugInfo("SUCCESS: ", $media_file_name);
          }
        }
      } while ($download_attempt < 5 && $info['timed_out']);

PHP - 使用fsockopen（），fgets（）和feof（）下载非常大的文件

3 个答案: