Bug report
description
Using gzip.compress() with mtime=0 in 3.8<=cpython<=3.10, the OS byte, i.e. the 10th byte in the GZIP header, is set to 255 "unknown" (also see e.g. #83302):
|
return struct.pack("<BBBBLBB", 0x1f, 0x8b, 8, 0, int(mtime), xfl, 255) |
However, in cpython 3.11 and 3.12, the OS byte is suddenly set to a "known" value, e.g. 3 ("Unix") on Ubuntu.
This is not mentioned in the changelog for Python 3.11.
This may lead to problems in the context of reproducible builds. In our case, hash checking fails after decompressing and re-compressing a gzipped archive.
how to reproduce
Here's an example, where byte 10 is \xff in python 3.10 and \x03 in python 3.11:
~ $ python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
>>> import gzip
>>> gzip.compress(b'', mtime=0)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'
~ $ pyenv shell 3.11
~ $ python
Python 3.11.6 (main, Nov 23 2023, 17:30:16) [GCC 11.4.0] on linux
>>> import gzip
>>> gzip.compress(b'', mtime=0)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\x03\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'
cause
I guess this is caused by python 3.11 delegating the gzip.compress() call to zlib if mtime=0, as mentioned in the docs:
Changed in version 3.11: Speed is improved by compressing all data at once instead of in a streamed fashion. Calls with mtime set to 0 are delegated to zlib.compress() for better speed.
and source:
|
if mtime == 0: |
|
# Use zlib as it creates the header with 0 mtime by default. |
|
# This is faster and with less overhead. |
|
return zlib.compress(data, level=compresslevel, wbits=31) |
Apparently zlib does set the OS byte.
CPython versions tested on:
3.8, 3.9, 3.10, 3.11, 3.12
Operating systems tested on:
Linux, macOS, Windows
Linked PRs
Bug report
description
Using
gzip.compress()withmtime=0in 3.8<=cpython<=3.10, theOSbyte, i.e. the 10th byte in the GZIP header, is set to255"unknown" (also see e.g. #83302):cpython/Lib/gzip.py
Line 599 in dc0adb4
However, in cpython 3.11 and 3.12, the
OSbyte is suddenly set to a "known" value, e.g.3("Unix") on Ubuntu.This is not mentioned in the changelog for Python 3.11.
This may lead to problems in the context of reproducible builds. In our case, hash checking fails after decompressing and re-compressing a gzipped archive.
how to reproduce
Here's an example, where byte 10 is
\xffin python 3.10 and\x03in python 3.11:cause
I guess this is caused by python 3.11 delegating the
gzip.compress()call tozlibifmtime=0, as mentioned in the docs:and source:
cpython/Lib/gzip.py
Lines 609 to 612 in 89ddea4
Apparently
zlibdoes set theOSbyte.CPython versions tested on:
3.8, 3.9, 3.10, 3.11, 3.12
Operating systems tested on:
Linux, macOS, Windows
Linked PRs
gzip.compressoutput change in 3.11 #120480gzip.compressoutput change in 3.11 (GH-120480) #120612gzip.compressoutput change in 3.11 (GH-120480) #120613gzip.compressoutput change in 3.11 (GH-120480) #120614