Message381090
According to https://docs.python.org/3.5/whatsnew/changelog.html#id108 bpo-14099, reading multiple ZipExtFiles should be thread-safe, but it is not.
I created a small example where two threads try to read files from the same ZipFile simultaneously, which crashes with a Bad CRC-32 error. This is especially surprising since all files in the ZipFile only contain 0-bytes and have the same CRC.
My use case is a ZipFile with 82000 files. Creating multiple ZipFiles from the same "physical" zip file is not a satisfactory workaround because it takes several seconds each time. Instead, I open it only once and clone it for each thread:
with zipfile.ZipFile("/tmp/dummy.zip", "w") as dummy:
pass
def clone_zipfile(z):
z_cloned = zipfile.ZipFile("/tmp/dummy.zip")
z_cloned.NameToInfo = z.NameToInfo
z_cloned.fp = open(z.fp.name, "rb")
return z_cloned
This is a much better solution for my use case than locking. I am using multiple threads because I want to finish my task faster, but locking defeats that purpose.
However, this cloning is somewhat of a dirty hack and will break when the file is not a real file but rather a file-like object.
Unfortunately, I do not have a solution for the general case. |
|
| Date |
User |
Action |
Args |
| 2020-11-16 11:07:51 | Thomas | set | recipients:
+ Thomas |
| 2020-11-16 11:07:51 | Thomas | set | messageid: <[email protected]> |
| 2020-11-16 11:07:51 | Thomas | link | issue42369 messages |
| 2020-11-16 11:07:51 | Thomas | create | |
|