Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

Is there a way to download parts of the content of a zip file?

Writer Sophia Terry

If there is a big zip file uploaded on a server, and all you need is some of it's content, is there a way to open it, and choose what you want to download?

8

4 Answers

I wrote a Python script list_remote_zip.py that can list files in a zip file that is accessible over HTTP:

import urllib2, struct, sys
def open_remote_zip(url, offset=0): return urllib2.urlopen(urllib2.Request(url, headers={'Range': 'bytes={}-'.format(offset)}))
offset = 0
zipfile = open_remote_zip(sys.argv[1])
header = zipfile.read(30)
while header[:4] == 'PK\x03\x04': compressed_len, uncompressed_len = struct.unpack('<II', header[18:26]) filename_len, extra_len = struct.unpack('<HH', header[26:30]) header_len = 30 + filename_len + extra_len total_len = header_len + compressed_len print('{}\n offset: {}\n length: {}\n header: {}\n payload: {}\n uncompressed length: {}'.format(zipfile.read(filename_len), offset, total_len, header_len, compressed_len, uncompressed_len)) zipfile.close() offset += total_len zipfile = open_remote_zip(sys.argv[1], offset) header = zipfile.read(30)
zipfile.close()

It does not use the zip file's central directory, which is near the end of the file. Instead, it goes from the start and parses individual local headers and skips over the payload, hoping to land at another header. It sends a new request every time it needs to skip to an offset. This of course only works with servers that support the Range HTTP header.

It only needs to be passed the URL to the zip file as a command line argument. Example usage and output should look something like this:

$ python list_remote_zip.py
Xonotic/Makefile offset: 0 length: 1074 header: 46 payload: 1028 uncompressed length: 5019
Xonotic/source/darkplaces/ offset: 1074 length: 56 header: 56 payload: 0 uncompressed length: 0
Xonotic/source/darkplaces/bih.h offset: 1130 length: 1166 header: 61 payload: 1105 uncompressed length: 2508
Xonotic/source/darkplaces/portals.h offset: 2296 length: 334 header: 65 payload: 269 uncompressed length: 648
...

To download one of the files, I wrote an even uglier get_file_from_remote_zip.sh bash script around it that uses wget:

info=$(python list_remote_zip.py "$1" | grep -m 1 -A 5 "^$2\$" | tail -n +2)
tmpfile=$(mktemp)
wget --start-pos $(echo "$info" | grep offset | grep -o '[[:digit:]]*') -O - "$1" | head -c $(echo "$info" | grep -m 1 length | grep -o '[[:digit:]]*') >"$tmpfile"
printf '\x1f\x8b' # gzip magic
tail -c +9 <"$tmpfile" | head -c 1 # copy compression method
printf '\0\0\0\0\0\0\x03' # some flags and mtime
tail -c "+$(expr 1 + $(echo "$info" | grep header | grep -o '[[:digit:]]*'))" <"$tmpfile"
tail -c +15 <"$tmpfile" | head -c 4 # The CRCs seem to be compatible.
tail -c +23 <"$tmpfile" | head -c 4
rm "$tmpfile"

It takes 2 arguments. The first is the URL of the zip file and the second the file to be extracted. The to-be-extracted file's name has to be complete and exactly as it appears in the output of the previous list_remote_zip.py Python script, which it uses to get some information about the file. It then uses wget to download it at the right offset with the right length. It saves this zip "slice" to a temporary file, which is then used to output a gzip-formatted file, which can then be piped to and decompressed with gzip. The "slice" itself is not a valid zip file because it has no central directory at the end. It could be fixed with zip's -FF option but I decided to instead change the headers a little and convert it to a gzip file. Both (PK)zip and gzip use the same deflate compression algorithm and even the CRC-32 checksums seem to be compatible.

Here is an example of how to download a random file from Xonotic's archive available at , decompress it and save it to a local file:

bash get_file_from_remote_zip.sh Xonotic/source/darkplaces/mprogdefs.h | gzip -d >mprogdefs.h
3

If you are accessing a file server and have winrar (and probably other similar applications) installed, you can open the .zip and drag out the files you want.

If you are talking about a web server, I don't think you can.

0

Assuming the server supports resumed downloads it would in theory be possible to write a client that did this--grab a big enough block near the end to get the directory, then use that to figure out what you need to grab to actually get the data--simply start downloading at that position and stop when you have enough data. It's been so long since I was poking around I don't recall if there's a means of finding the start of the directory other than brute force.

I've never heard of such a client and can't imagine why one would be developed--if it's data that reasonably would be downloaded in pieces then why is the webmaster storing it as one big zip file???

3

Mount the remote ZIP file via an HTTP-backed virtual filesystem and then use the standard unzip command on it. This way the unzip utility's I/O calls are translated to HTTP range GETs, which means only the chunks of the ZIP file that you want get transferred over the network.

Here's an example for Linux using HTTPFS, a very lightweight FUSE-based virtual filesystem. There are similar tools for Windows. Programming languages like Python and Java provide HTTP I/O as well, just combine them with their ZIP-reading logic appropriately.

Get/build httpfs:

$ wget
$ tar -xjf httpfs_1.06.07.10.tar.bz2
$ rm httpfs
$ ./make_httpfs

Mount a remote ZIP file and extract one file from it:

$ mkdir mount_pt
$ sudo ./httpfs mount_pt
$ ls mount_pt
zipfile.zip
$ unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt

(need for sudo may vary based on how FUSE is setup on your system)

Of course you can also use whatever other tools beside the command-line one.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy