是否可以只下载ZIP存档的一部分(例如一个文件)?

时间:2011-12-17 06:54:57

标签: zip rar

我想知道有没有办法只下载.rar或.zip文件的一部分而不下载整个文件? 有一个包含文件A,B,C和D的zip文件。 我只需要A.我可以以某种方式调整下载以仅下载A或者如果可能的话在服务器中提取文件并仅获得A?

8 个答案:

答案 0 :(得分:11)

诀窍是做塞尔吉奥所建议的而不用手动做。如果您通过HTTP支持的虚拟文件系统挂载zip文件,然后在其上使用标准的unzip命令,这很容易。通过这种方式,解压缩实用程序的I / O调用被转换为HTTP范围获取,这意味着只需要通过网络传输zip的块。

以下是使用HTTPFS的Linux的示例,这是一个非常轻量级的虚拟文件系统(它使用FUSE)。 Windows也有类似的工具。

获取/构建httpfs:

$ wget http://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02
$ tar -xjf httpfs_1.06.07.10.tar.bz2 
$ rm httpfs
$ ./make_httpfs 

挂载远程zip文件并从中提取一个文件:

$ mkdir mount_pt
$ sudo ./httpfs http://server.com/zipfile.zip mount_pt
$ sudo ls mount_pt 
zipfile.zip
$ sudo unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt 

当然,您也可以使用命令行旁边的其他任何工具。 (我需要sudo,因为看起来FUSE在我的机器上就是这样设置的,你不应该需要它)

我知道这是一个老问题,这是其他人遇到这个问题。

答案 1 :(得分:7)

在某种程度上,是的,你可以。

ZIP file format说有一个“中心目录”。基本上,这是一个表,用于存储归档中的文件以及它们具有的偏移量。

因此,使用Content-Range,您可以从末尾下载部分文件(中心目录是zip文件中的最后一项),并尝试识别其中的中心目录。如果你成功了,那么你就知道了文件列表和偏移量,所以你可以继续分别获取这些块并自行解压缩。

这种方法非常容易出错,无法保证正常工作。但一般来说黑客攻击也是如此: - )

另一种可能的方法是为此构建自定义服务器(有关详细信息,请参阅@pst's answer)。

答案 2 :(得分:3)

普通人有几种方法可以从压缩的ZIP文件下载单个文件,遗憾的是,这些方法并不常见。有一些开源工具和在线Web服务,包括:

答案 3 :(得分:1)

你可以使用FDM,它支持Zip文件部分下载: 免费下载管理器允许您只下载zip文件的必要部分。

http://www.freedownloadmanager.org/features.htm

答案 4 :(得分:0)

我认为Sergei Tulentsevs的想法很棒。

但是,如果可以控制服务器 - 例如可以部署自定义代码 - 然后它是一个相当简单的操作(在方案中:)来映射/处理请求,提取ZIP存档的相关部分,并在HTTP流中发回数据。

请求可能如下所示:

http://foo.bar/myfile.zip_a.jpeg

这意味着从“myfile.zip”中提取 - 并返回 - “a.jpeg”。

(我故意选择这种愚蠢的格式,以便浏览器在出现时可能会选择“myfile.zip_a.jpeg”作为下载对话框中的名称。)

当然,如何实现取决于服务器/语言/框架,可能已经存在支持类似操作的现有解决方案(但我不知道)。

快乐的编码。

答案 5 :(得分:0)

相反,请使用Google Docs的读者。转到此链接 - https://docs.google.com/viewer?url=http://file.zip并更改zip文件的地址。它可以打开zip和rar文件

答案 6 :(得分:0)

Can you arrange for your file to appear in the back of the zip?

Download 100k:

$ curl -r -100000 https://www.keepassx.org/releases/2.0.2/KeePassX-2.0.2.zip -o tail.zip
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  84739      0  0:00:01  0:00:01 --:--:-- 84817

Check what files we did get:

$ unzip -t tail.zip
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
    testing: KeePassX-2.0.2/share/translations/keepassx_uk.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_CN.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_TW.qm   OK
    testing: KeePassX-2.0.2/zlib1.dll   OK
At least one error was detected in tail.zip.

Then extract the last file:

$ unzip tail.zip KeePassX-2.0.2/zlib1.dll
Archive:  tail.zip
error [tail.zip]:  missing 7751495 bytes in zipfile
  (attempting to process anyway)
  inflating: KeePassX-2.0.2/zlib1.dll  

答案 7 :(得分:0)

基于良好的输入,我在 Powershell 中编写了一个代码片段来展示它是如何工作的:

# demo code downloading a single DLL file from an online ZIP archive
# and extracting the DLL into memory to mount it finally to the main process.

cls
Remove-Variable * -ea 0

# definition for the ZIP archive, the file to be extracted and the checksum:
$url = 'https://github.com/sshnet/SSH.NET/releases/download/2020.0.1/SSH.NET-2020.0.1-bin.zip'
$sub = 'net40/Renci.SshNet.dll'
$md5 = '5B1AF51340F333CD8A49376B13AFCF9C'

# prepare HTTP client:
Add-Type -AssemblyName System.Net.Http
$handler = [System.Net.Http.HttpClientHandler]::new()
$client  = [System.Net.Http.HttpClient]::new($handler)

# get the length of the ZIP archive:
$req = [System.Net.HttpWebRequest]::Create($url)
$req.Method = 'HEAD'
$length = $req.GetResponse().ContentLength
$zip = [byte[]]::new($length)

# get the last 10k:
# how to get the correct length of the central ZIP directory here?
$start = $length-10kb
$end   = $length-1
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$last10kb = $result.content.ReadAsByteArrayAsync().Result
$last10kb.CopyTo($zip, $start)

# get the block containing the DLL file:
# how to get the exact file-offset from the ZIP directory?
$start = $length-3537kb
$end   = $length-3201kb
$client.DefaultRequestHeaders.Clear()
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$block = $result.content.ReadAsByteArrayAsync().Result
$block.CopyTo($zip, $start)

# extract the DLL file from archive:
Add-Type -AssemblyName System.IO.Compression
$stream = [System.IO.Memorystream]::new()
$stream.Write($zip,0,$zip.Length)
$archive = [System.IO.Compression.ZipArchive]::new($stream)
$entry = $archive.GetEntry($sub)
$bytes = [byte[]]::new($entry.Length)
[void]$entry.Open().Read($bytes, 0, $bytes.Length)

# check MD5:
$prov = [Security.Cryptography.MD5CryptoServiceProvider]::new().ComputeHash($bytes)
$hash = [string]::Concat($prov.foreach{$_.ToString("x2")})
if ($hash -ne $md5) {write-host 'dll has wrong checksum.' -f y ;break}

# load the DLL:
[void][System.Reflection.Assembly]::Load($bytes)

# use the single demo-call from the DLL:
$test = [Renci.SshNet.NoneAuthenticationMethod]::new('test')
'done.'