Bug report
I've seen this in the free-threaded build, but I think the problem can theoretically occur in the default build as well.
The problem is that after a fork(), an already dead ThreadHandle may be deallocated before it's marked as not joinable. The ThreadHandle_dealloc() function can crash in PyThread_detach_thread():
|
ThreadHandle_dealloc(ThreadHandleObject *self) |
|
{ |
|
PyObject *tp = (PyObject *) Py_TYPE(self); |
|
if (self->joinable) { |
|
int ret = PyThread_detach_thread(self->handle); |
The steps leading to the crash are:
- A thread
T2 starts and finishes, but is not joined. The ThreadHandle is not immediately deallocated, either because it's part of a larger reference cycle or due to biased reference counting (in the free-threaded build)
- The main thread calls
fork()
- In the child process, during
PyOS_AfterFork_Child(), the ThreadHandle is deallocated. I've seen this happen in the free-threaded build due to biased reference counting merging the thread states in PyThreadState_Clear(). I believe this can also happen in the default build if, for example, a GC is triggered early on during threading._after_fork() before we get to marking the ThreadHandle as not joinable.
Proposed fix
Early on in PyOS_AfterFork_Child(), we should fix up all ThreadHandle objects from C (without executing Python code) -- we should mark the dead ones as not joinable and update the remaining active thread.
I think it's important to do this without executing Python code. Once we start executing Python code, almost anything can happen, such as GC collections, destructors, etc.
cc @pitrou @gpshead @ericsnowcurrently
Linked PRs
Bug report
I've seen this in the free-threaded build, but I think the problem can theoretically occur in the default build as well.
The problem is that after a
fork(), an already deadThreadHandlemay be deallocated before it's marked as not joinable. TheThreadHandle_dealloc()function can crash inPyThread_detach_thread():cpython/Modules/_threadmodule.c
Lines 66 to 70 in bcccf1f
The steps leading to the crash are:
T2starts and finishes, but is not joined. TheThreadHandleis not immediately deallocated, either because it's part of a larger reference cycle or due to biased reference counting (in the free-threaded build)fork()PyOS_AfterFork_Child(), theThreadHandleis deallocated. I've seen this happen in the free-threaded build due to biased reference counting merging the thread states inPyThreadState_Clear(). I believe this can also happen in the default build if, for example, a GC is triggered early on duringthreading._after_fork()before we get to marking theThreadHandleas not joinable.Proposed fix
Early on in
PyOS_AfterFork_Child(), we should fix up allThreadHandleobjects from C (without executing Python code) -- we should mark the dead ones as not joinable and update the remaining active thread.I think it's important to do this without executing Python code. Once we start executing Python code, almost anything can happen, such as GC collections, destructors, etc.
cc @pitrou @gpshead @ericsnowcurrently
Linked PRs