Feature or enhancement
Proposal:
Good afternoon,
The dbm module, and by extension shelve as well, don't provide any way to reclaim free space when lots of deletions from the database happen. This applies to all of the dbm submodules (dbm.dumb, dbm.sqlite, dbm.ndbm, dbm.gnu).
This can lead to hundreds of GB of wasted space when using them to store complex objects, such as when using them as a persistent cache.
Most of the underlying libraries, however, support ways to retrieve space on-demand:
VACUUM in sqlite3
gdbm_reorganize for gnu
- None for ndbm
- None for dumb (but this is simple to implement and I would be happy to contribute: in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many
dbm.dumb operations already)
Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb. For now they are only comments in the source code and are hidden from developers reading the doc:
- Lack of support for any concurrency
- Slowness linearly proportional to index size
- (This will hopefully be fixed by the PR so it won't be included but otherwise also) never retrieves space of deleted items.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/dbm-module-add-vacuuming/91507
Linked PRs
Feature or enhancement
Proposal:
Good afternoon,
The
dbmmodule, and by extensionshelveas well, don't provide any way to reclaim free space when lots of deletions from the database happen. This applies to all of thedbmsubmodules (dbm.dumb,dbm.sqlite,dbm.ndbm,dbm.gnu).This can lead to hundreds of GB of wasted space when using them to store complex objects, such as when using them as a persistent cache.
Most of the underlying libraries, however, support ways to retrieve space on-demand:
VACUUMin sqlite3gdbm_reorganizefor gnudbm.dumboperations already)Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb. For now they are only comments in the source code and are hidden from developers reading the doc:
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/dbm-module-add-vacuuming/91507
Linked PRs