This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Thomas
Recipients Thomas
Date 2020-11-16.11:07:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <[email protected]>
In-reply-to
Content
According to https://docs.python.org/3.5/whatsnew/changelog.html#id108 bpo-14099, reading multiple ZipExtFiles should be thread-safe, but it is not.

I created a small example where two threads try to read files from the same ZipFile simultaneously, which crashes with a Bad CRC-32 error. This is especially surprising since all files in the ZipFile only contain 0-bytes and have the same CRC.

My use case is a ZipFile with 82000 files. Creating multiple ZipFiles from the same "physical" zip file is not a satisfactory workaround because it takes several seconds each time. Instead, I open it only once and clone it for each thread:

with zipfile.ZipFile("/tmp/dummy.zip", "w") as dummy:
    pass

def clone_zipfile(z):
    z_cloned = zipfile.ZipFile("/tmp/dummy.zip")
    z_cloned.NameToInfo = z.NameToInfo
    z_cloned.fp = open(z.fp.name, "rb")
    return z_cloned

This is a much better solution for my use case than locking. I am using multiple threads because I want to finish my task faster, but locking defeats that purpose.

However, this cloning is somewhat of a dirty hack and will break when the file is not a real file but rather a file-like object.

Unfortunately, I do not have a solution for the general case.
History
Date User Action Args
2020-11-16 11:07:51Thomassetrecipients: + Thomas
2020-11-16 11:07:51Thomassetmessageid: <[email protected]>
2020-11-16 11:07:51Thomaslinkissue42369 messages
2020-11-16 11:07:51Thomascreate