Skip to content

bpo-21417: Compression level for zipfile#5385

Merged
gpshead merged 13 commits intopython:masterfrom
bbayles:zipfile-compresslevel
Jan 30, 2018
Merged

bpo-21417: Compression level for zipfile#5385
gpshead merged 13 commits intopython:masterfrom
bbayles:zipfile-compresslevel

Conversation

@bbayles
Copy link
Copy Markdown
Contributor

@bbayles bbayles commented Jan 28, 2018

This PR adds a compresslevel parameter to zipfile.ZipFile(), similar to the
one that exists for tarfile.open().

A few things to point out:

  1. The default compression type and level remain unchanged. In the issue someone thought compresslevel=None might make it look like no compression is being done - this is in fact the case with the default ZIP_STORED.
  2. The ZIP_DEFLATED and ZIP_BZIP2 compression types support different levels - 0 through 9 for the former; 1 through 9 for the latter. Specifying mode='w:bz2' in tarfile.open() leads to a ValueError; the same thing happens here.
  3. You can set compresslevel on mode='r', but it doesn't do anything. This matches tarfile.open()'s behavior - there you can do stuff like compresslevel=9000 and compresslevel='kittens' with no error.
  4. The ZIP_LZMA compression type doesn't support "levels." It has a similar concept called a preset, however we can't set that here because we're using custom filters. These are mutually exclusive.
    • Because of this, if you try to set mode='w:xz' and compresslevel=9 (or whatever) in tarfile.open, you get an obscure TypeError. I will file a new bug for that.
    • Because of this, I've made compresslevel for ZIP_LZMA and ZIP_STORED have no effect for zipfile.ZipFile. I could raise an exception instead, but given that tarfile.open() ignores parameters that don't make sense (e.g. mode='r:gz' and compresslevel=9000) I thought this was appropriate.
  5. I've added tests to exercise this new functionality to the test_zipfile module. In my own testing I used this script, which shows things in action.
  6. I did not touch PyZipFile() for simplicity's sake; I can be persuaded to try it out.

I will add a few more notes inline.

https://bugs.python.org/issue21417

Comment thread Doc/library/zipfile.rst

Open a ZIP file, where *file* can be a path to a file (a string), a
file-like object or a :term:`path-like object`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the style guide, I took the liberty of adding line breaks here; the constructor explanation was already quite long.

Comment thread Lib/zipfile.py Outdated
'filename',
'date_time',
'compress_type',
'_compress_level',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've prefixed this with an underscore because it's not exposed when reading back. ZipInfo has another "private" attribute, _raw_time below.

Comment thread Doc/library/zipfile.rst Outdated
writing files to the archive. When using :const:`ZIP_STORED` or
:const:`ZIP_LZMA` it has no effect. When using :const:`ZIP_DEFLATED`
integers ``0`` through ``9`` are accepted. When using :const:`ZIP_BZIP2`
integers ``1`` through ``9`` are accepted.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is worth linking this to the compresslevel parameters of the :class:`gzip.GzipFile` and :class:`bz2.Bz2File` constructors, which explain the acceptable values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if zlib.compressobj and bz2.BZ2Compressor would be more appropriate? I don't want to give the false impression that GzipFile or Bz2File are actually used at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with your suggestion; the link to the zlib.compressobj() function looks weird when rendered.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah zlib.compressobj would have been more appropriate (I was confused; the gzip format is not even used in zip files, just the underlying Deflate algorithm). My point was that you should explain what the different values mean, compared to None. How do the different values control the compression level relative to the default of None?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah. I think links to the places where this is already described make sense, as you suggest. (we could add that to tarfile in another PR also; it's missing the explanation)

Current version looks reasonable when rendered:

image

Comment thread Doc/library/zipfile.rst Outdated
overrides the value given for the *compression* parameter to the constructor for
the new entry.
the new entry. Similarly, if given, *compress_level* overrides the *compresslevel*
parameter.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling is too similar yet inconsistent. Wouldn’t it be better to remove the underscore from all parameters, and differentiate them as method parameters versus the constructor parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the example of compression (constructor's argument) to compress_type (method's argument). It's not clear to me why those are different, but if it's OK for compresslevel to be the same throughout that's fine by me?

Comment thread Lib/test/test_zipfile.py Outdated
self.assertEqual(a_info.compress_type, self.compression)
self.assertEqual(a_info._compress_level, 1)

# Compression level is overriden.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overridden

@KNareshseo
Copy link
Copy Markdown

The only way is to recompress the zip file with different levels until you find the one that matches the lengths. You could just recompress one of the entries to find the level, on the assumption that the entire zip file used the same level.

Even that only works if you know the tool, and the version of the tool that was used. E.g. 7z, WinZip, Info-ZIP.

Copy link
Copy Markdown
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this!

@gpshead gpshead added the type-feature A feature request or enhancement label Jan 30, 2018
@gpshead gpshead merged commit ce237c7 into python:master Jan 30, 2018
@Sworddragon Sworddragon mannequin mentioned this pull request Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants