gh-134004: Dbm vacuuming#134028
gh-134004: Dbm vacuuming#134028serhiy-storchaka merged 21 commits intopython:mainfrom Andrea-Oliveri:dbm-vacuum-gh134004
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
…ce to module links not found
…m.dumb for consistency with dbm.gnu. Also updated documentations and tests to reflect the change
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Very good code.
However, there are few issues.
-
dbm.dumbdoes not support concurrent writing, but it supports limited concurrent access with a single writer and multiple readers. Readers may not see new data, and can read deleted data (because they cache the index), but otherwise they do not conflict with writer. But when writer callsreorganize(), it breaks readers. So using this feature can cause new bugs.I am not sure that we can to do something with this.
-
Other issue is that
reorganize()is inherently not fail-safe. If something happens during the reorganization process, you may lose data.
Do gdbm_reorganize() and Sqlite VACUUM have the same issues or they provide some protection?
|
Good morning @serhiy-storchaka, Easy changesI have pushed changes to address the documentation and dbm.dumb comment remarks. Organize not being fail-safeOn the topic of
Let me know how you think we should proceed. SQLITE and GDBM are designed to be much more robust and use the second technique to ensure failed reorganizations don't corrupt the database. Crucially, however, they handle a single binary file with both index and data, hence either everything is renamed atomically or nothing is. ConcurrencyMy apologies, but I would disagree Examples:
Of course this problem is well-known to any developer who looked at the source code for One thing that was already well explained in the documentation, however:
Please let me know how you would like to proceed on this issue as well. In the meantime, have a great day! 😃 |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Thank you for your response. What is left:
- Add
.. versionadded:: nextfor all new methods. - Add What's New entries.
There was a problem hiding this comment.
LGTM. 👍
Thank you for your contribution, @Andrea-Oliveri. Please ping me next week if I don't merge this PR before then.
|
Thank you @serhiy-storchaka 😄 . Just FYI, since I had merge conflicts due to the whatsnew I have merged with main between you marking as approved and me reading the message. |
|
This is great; thanks for the proposal and implementation, @Andrea-Oliveri; thanks for managing this, @serhiy-storchaka. |
…mb and shelve (pythonGH-134028) They are similar to the same named method in dbm.gnu.
…mb and shelve (pythonGH-134028) They are similar to the same named method in dbm.gnu.
…mb and shelve (pythonGH-134028) They are similar to the same named method in dbm.gnu.
gh-134004:
.reorganize()method todbm.sqliteanddbm.dumbwhich allows to retrieve unused space from database files. Name was chosen for consistency withdbm.gnu.reorganize().dbm.ndbm` does not support any such optimization.dbm.dumb.reorganize()is a new custom implementation rewriting the database file in-place and removing the gaps left by deletions, then truncating the excess..reorganize()runs as expected on all compatibledbmsubmodules.dbmmodule to include contributed methods as well as to highlight some of the limitations of the current implementations, such as lack of concurrency compatibility (fordbm.dumb) and lack of automatic reorganizing.reorganize()method toshelveto expose dbm's submodules methodsshelve.reorganize()Note: Unfortunately I did not feel comfortable changing the C code at this time, so I chose not to add an empty
.reorganize()method todbm.ndbm. This resulted in:dbmsubmodules.hasattr(), whether thedbmobjects have the.reorganize()method and succeed early if that is not the case.