Bytecode parity - exception#7557
Conversation
📝 WalkthroughWalkthroughRefactors exception handling across compiler and VM, moves reraise into cleanup scopes, adds IR optimization passes (tuple unpack/swap/dead-store elimination and uninitialized-load checks), removes explicit exception-passing frame API, and adds two bytecode-dump/compare scripts for CPython vs RustPython. Changes
Sequence DiagramsequenceDiagram
participant Compiler as Codegen
participant ExceptionTable as Exception Table
participant Bytecode as Emitted Bytecode
participant VM as Runtime VM
participant Frame as Frame Stack
Note over Compiler,ExceptionTable: Compile-time: new handler layout
Compiler->>ExceptionTable: register cleanup & handler ranges
Compiler->>Bytecode: emit TRY, cleanup, Reraise{depth:0} inside cleanup
Note over Bytecode,VM: Runtime unwind now preserves extra values
VM->>Frame: exception raised -> consult Exception Table
Frame->>VM: unwind to cleanup handler (preserved values remain)
VM->>Frame: outer exception-table unwinder discards preserved values to target
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
7e7e111 to
a15185e
Compare
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] test: cpython/Lib/test/test_compile.py (TODO: 27) dependencies: dependent tests: (no tests depend on compile) Legend:
|
876791a to
8b264e5
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/codegen/src/compile.rs (2)
3958-3987:⚠️ Potential issue | 🔴 Critical
orelseis re-armed forfinallyon the wrong block.
SetupFinallyandpush_fblock_full(FinallyTry, ...)run beforeswitch_to_block(else_block), so they attach to the current block (cleanup_block), which is dead afterReraise { depth: 1 }. Theorelsebytecode then executes without an activefinallysetup, so an exception raised fromorelsecan skipfinalbody.Suggested fix
- if !finalbody.is_empty() { - emit!( - self, - PseudoInstruction::SetupFinally { - delta: finally_block - } - ); - self.push_fblock_full( - FBlockType::FinallyTry, - finally_block, - finally_block, - FBlockDatum::FinallyBody(finalbody.to_vec()), - )?; - } if else_block != continuation_block { self.switch_to_block(else_block); + if !finalbody.is_empty() { + emit!( + self, + PseudoInstruction::SetupFinally { + delta: finally_block + } + ); + self.push_fblock_full( + FBlockType::FinallyTry, + finally_block, + finally_block, + FBlockDatum::FinallyBody(finalbody.to_vec()), + )?; + } self.compile_statements(orelse)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 3958 - 3987, The SetupFinally emission and push_fblock_full(FBlockType::FinallyTry, ...) are being emitted before switch_to_block(else_block) so the finally setup attaches to the current (cleanup) block; move the emit!(..., PseudoInstruction::SetupFinally { delta: finally_block }) and the self.push_fblock_full(FBlockType::FinallyTry, finally_block, finally_block, FBlockDatum::FinallyBody(finalbody.to_vec())) call to immediately after self.switch_to_block(else_block) and before self.compile_statements(orelse) so the finally is armed on else_block; keep the existing PopBlock emit and self.pop_fblock(FBlockType::FinallyTry) after compiling orelse to unwind the finally for the else path.
4006-4023:⚠️ Potential issue | 🔴 Critical
except* ... finallystill drops its cleanup handler before rethrowing.This exception-path
finallystill emitsPopBlock/pop_fblock(FinallyEnd)beforeReraise { depth: 0 }. That removes the cleanup range just before the original exception is re-raised, soCOPY 3; POP_EXCEPT; RERAISE 1never gets a chance to restoreexc_info. It reintroduces the same bug you already fixed incompile_try_statement.Suggested fix
- if finally_cleanup_block.is_some() { - emit!(self, PseudoInstruction::PopBlock); - self.pop_fblock(FBlockType::FinallyEnd); - } - emit!(self, Instruction::Reraise { depth: 0 }); + + if finally_cleanup_block.is_some() { + emit!(self, PseudoInstruction::PopBlock); + self.pop_fblock(FBlockType::FinallyEnd); + }
🧹 Nitpick comments (4)
crates/codegen/src/ir.rs (2)
1408-1427: Useinstruction_lineno()helper for consistent line number handling.The
next_swappablefunction extracts line numbers directly frominfo.location.line.get(), ignoring anylineno_overridethat may be set (e.g.,-1for cold blocks). The existinginstruction_linenohelper at line 3171 handles this correctly.This inconsistency could cause incorrect line comparisons for instructions with overridden line numbers.
♻️ Proposed fix
fn next_swappable(instrs: &[InstructionInfo], mut i: usize, lineno: i32) -> Option<usize> { loop { i += 1; if i >= instrs.len() { return None; } let info = &instrs[i]; - let info_lineno = info.location.line.get() as i32; + let info_lineno = instruction_lineno(info); if lineno >= 0 && info_lineno > 0 && info_lineno != lineno { return None; }Note: You'll need to either make
instruction_linenovisible here or inline the logic:let info_lineno = info.lineno_override.unwrap_or_else(|| info.location.line.get() as i32);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 1408 - 1427, In next_swappable, the code reads the instruction line directly from info.location.line.get(), which ignores any lineno_override; change it to use the existing instruction_lineno helper (or inline its logic) so comparisons account for overrides (e.g., let info_lineno = instruction_lineno(info) or use info.lineno_override.unwrap_or_else(|| info.location.line.get() as i32)); update next_swappable to call instruction_lineno/info.lineno_override to get info_lineno before comparing with lineno.
1978-1980: Consider avoiding unnecessary clone when no changes occur.The instructions are cloned unconditionally at line 1978, but the clone is only needed when
changedbecomes true. For blocks where no LOAD_FAST needs conversion to LOAD_FAST_CHECK, this clone is wasted work.However, since this runs once during compilation and the optimization benefit is marginal, this is fine as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` around lines 1978 - 1980, The code unconditionally clones self.blocks[idx].instructions into old_instructions even when no changes occur; to fix, iterate over references to self.blocks[idx].instructions (or take a slice) and only allocate/clone into new_instructions when you detect a needed change (when encountering a LOAD_FAST that must become LOAD_FAST_CHECK), then replace or assign the modified Vec back into self.blocks[idx].instructions (or use std::mem::replace/take) so blocks that are unchanged avoid the clone; update uses of old_instructions/new_instructions accordingly (refer to old_instructions, new_instructions, changed, and self.blocks[idx].instructions).crates/codegen/src/compile.rs (1)
8509-8544: Please add an EX_CALL regression for optimizedsuper()calls.This introduces a new bytecode shape for
super().method(*args, **kwargs)/super(C, self).method(*args). A focused disassembly or behavior test here would make future parity regressions much easier to catch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 8509 - 8544, Add a regression test that exercises the optimized super() call path (when can_optimize_super_call returns Some) for both ZeroArg and TwoArg SuperCallType so we catch the new CALL_FUNCTION_EX shape: construct snippets like super().method(*args, **kwargs) and super(C, self).method(*args), compile them and assert the generated bytecode/assembly contains the EX_CALL/CALL_FUNCTION_EX path (or that uses_ex_call branch was taken) and the sequence produced by emit_load_super_attr/emit_load_zero_super_attr plus the emitted PushNull and codegen_call_helper invocation; place the test with other codegen/compile tests and name it to indicate it’s a regression for optimized super EX_CALLs.crates/vm/src/vm/mod.rs (1)
2067-2081: Consider removing unreachable else branch.With
ExceptionStack::default()now initializing the stack withvec![None], the stack should never be empty. Thedebug_asserton line 2070-2073 confirms this invariant, making theelsebranch (lines 2079-2080) unreachable dead code.♻️ Suggested simplification
pub(crate) fn set_exception(&self, exc: Option<PyBaseExceptionRef>) { // don't be holding the RefCell guard while __del__ is called let mut excs = self.exceptions.borrow_mut(); debug_assert!( !excs.stack.is_empty(), "set_exception called with empty exception stack" ); - if let Some(top) = excs.stack.last_mut() { - let prev = core::mem::replace(top, exc); - drop(excs); - drop(prev); - } else { - excs.stack.push(exc); - drop(excs); - } + let top = excs + .stack + .last_mut() + .expect("set_exception called with empty exception stack"); + let prev = core::mem::replace(top, exc); + drop(excs); + drop(prev); #[cfg(feature = "threading")] thread::update_thread_exception(self.topmost_exception()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 2067 - 2081, The else branch in set_exception is unreachable because ExceptionStack::default() seeds stack with vec![None] and there's already a debug_assert that the stack is non-empty; remove the unreachable else branch and simplify set_exception to always operate on excs.stack.last_mut() (i.e., replace the top entry and drop the previous value) while preserving the borrow_mut, debug_assert, core::mem::replace usage and drop(prev) semantics so behavior and lifetimes around __del__ remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 1429-1431: Rename the local variable instrs to a non-spell-checked
name (e.g., instructions) to avoid the spell-check warning: in the loop over
self.blocks where you currently bind let instrs = &mut block.instructions;
change that binding to let instructions = &mut block.instructions; and update
all subsequent uses in this scope (same pattern as in
optimize_build_tuple_unpack) so references to instrs are replaced with
instructions.
- Around line 1514-1516: Rename the local variable named `instrs` to
`instructions` in the loop that iterates `for block in &mut self.blocks` (the
binding currently `let instrs = &mut block.instructions;`), and update all
references within that loop's scope to use `instructions` instead of `instrs` so
the spell checker no longer flags the abbreviation; leave `block.instructions`
unchanged.
- Around line 1345-1352: The local variable name `instrs` (declared as `let
instrs = &mut block.instructions;`) must be renamed to `instructions` to satisfy
the spell checker; update the binding and every use in this scope (e.g.,
`instrs.len()`, `instrs[i]`, `instrs[i + 1]`) to `instructions`, keeping the
same reference to `block.instructions` and preserving the surrounding logic that
checks `Instruction::BuildTuple` and `Instruction::UnpackSequence`.
In `@scripts/compare_bytecode.py`:
- Around line 103-127: The current _finish_one() swallows worker failures by
returning {} on timeout/parse errors and unlinking files_file so finish_dump()
loses those file paths; change _finish_one() to preserve the assigned file
entries on error by reading the files list from files_file before unlinking and
returning a fallback mapping (e.g., each filename -> [] or a special failure
marker) instead of {}, and ensure this path is taken for
subprocess.TimeoutExpired, non-zero proc.returncode, and JSONDecodeError cases
so finish_dump() receives the original file set; use the existing files_file and
proc symbols to locate and implement reading the file list and returning the
appropriate fallback result.
- Around line 59-61: The relative path used for filtering (relpath from
os.path.relpath(fpath, lib_dir)) may contain Windows backslashes and thus fail
fnmatch.fnmatch(pattern) checks; normalize relpath to POSIX separators (e.g. use
pathlib.Path(relpath).as_posix() or replace backslashes with '/') before calling
fnmatch.fnmatch(relpath, pattern) so patterns like "asyncio/*.py" match
cross-platform.
- Line 342: Replace the inline lambda assignments with proper function
definitions: change the assignment "log = lambda *a, **kw: print(*a,
file=sys.stderr, **kw)" to a def-based implementation (e.g., def log(*a, **kw):
print(*a, file=sys.stderr, **kw)) and do the same for the other lambda
assignment mentioned in the comment (the second lambda at line 424) so both
named callables are defined with def instead of lambdas to satisfy E731; keep
the same signature and behavior when converting to the new functions.
In `@scripts/dis_dump.py`:
- Around line 116-117: The _RP_CMP_OPS dict is incorrect and contains an
extraneous key 6; update the mapping for the COMPARE_OP discriminants in the
top-level symbol _RP_CMP_OPS to match RustPython's ComparisonOperator (0→"<",
1→"<=", 2→"==", 3→"!=", 4→">", 5→">="), removing the invalid key 6 so downstream
COMPARE_OP output is labelled correctly.
- Around line 134-141: The branch that treats deref-like opnames (the elif
listing
"LOAD_DEREF","STORE_DEREF","DELETE_DEREF","LOAD_CLOSURE","MAKE_CELL","COPY_FREE_VARS")
incorrectly treats COPY_FREE_VARS as a name/index into localsplus; remove
"COPY_FREE_VARS" from that tuple (or add an explicit case for COPY_FREE_VARS
before this branch) so its operand is always rendered as a numeric count rather
than passed through the deref/name fallback; update any comments/tests that
assume COPY_FREE_VARS is numeric.
---
Outside diff comments:
In `@crates/codegen/src/compile.rs`:
- Around line 3958-3987: The SetupFinally emission and
push_fblock_full(FBlockType::FinallyTry, ...) are being emitted before
switch_to_block(else_block) so the finally setup attaches to the current
(cleanup) block; move the emit!(..., PseudoInstruction::SetupFinally { delta:
finally_block }) and the self.push_fblock_full(FBlockType::FinallyTry,
finally_block, finally_block, FBlockDatum::FinallyBody(finalbody.to_vec())) call
to immediately after self.switch_to_block(else_block) and before
self.compile_statements(orelse) so the finally is armed on else_block; keep the
existing PopBlock emit and self.pop_fblock(FBlockType::FinallyTry) after
compiling orelse to unwind the finally for the else path.
---
Nitpick comments:
In `@crates/codegen/src/compile.rs`:
- Around line 8509-8544: Add a regression test that exercises the optimized
super() call path (when can_optimize_super_call returns Some) for both ZeroArg
and TwoArg SuperCallType so we catch the new CALL_FUNCTION_EX shape: construct
snippets like super().method(*args, **kwargs) and super(C, self).method(*args),
compile them and assert the generated bytecode/assembly contains the
EX_CALL/CALL_FUNCTION_EX path (or that uses_ex_call branch was taken) and the
sequence produced by emit_load_super_attr/emit_load_zero_super_attr plus the
emitted PushNull and codegen_call_helper invocation; place the test with other
codegen/compile tests and name it to indicate it’s a regression for optimized
super EX_CALLs.
In `@crates/codegen/src/ir.rs`:
- Around line 1408-1427: In next_swappable, the code reads the instruction line
directly from info.location.line.get(), which ignores any lineno_override;
change it to use the existing instruction_lineno helper (or inline its logic) so
comparisons account for overrides (e.g., let info_lineno =
instruction_lineno(info) or use info.lineno_override.unwrap_or_else(||
info.location.line.get() as i32)); update next_swappable to call
instruction_lineno/info.lineno_override to get info_lineno before comparing with
lineno.
- Around line 1978-1980: The code unconditionally clones
self.blocks[idx].instructions into old_instructions even when no changes occur;
to fix, iterate over references to self.blocks[idx].instructions (or take a
slice) and only allocate/clone into new_instructions when you detect a needed
change (when encountering a LOAD_FAST that must become LOAD_FAST_CHECK), then
replace or assign the modified Vec back into self.blocks[idx].instructions (or
use std::mem::replace/take) so blocks that are unchanged avoid the clone; update
uses of old_instructions/new_instructions accordingly (refer to
old_instructions, new_instructions, changed, and self.blocks[idx].instructions).
In `@crates/vm/src/vm/mod.rs`:
- Around line 2067-2081: The else branch in set_exception is unreachable because
ExceptionStack::default() seeds stack with vec![None] and there's already a
debug_assert that the stack is non-empty; remove the unreachable else branch and
simplify set_exception to always operate on excs.stack.last_mut() (i.e., replace
the top entry and drop the previous value) while preserving the borrow_mut,
debug_assert, core::mem::replace usage and drop(prev) semantics so behavior and
lifetimes around __del__ remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 363a271e-15bf-4199-b435-f764c39d9979
⛔ Files ignored due to path filters (2)
Lib/test/test_peepholer.pyis excluded by!Lib/**crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/vm/src/frame.rscrates/vm/src/vm/mod.rsscripts/compare_bytecode.pyscripts/dis_dump.py
| let instrs = &mut block.instructions; | ||
| let len = instrs.len(); | ||
| for i in 0..len.saturating_sub(1) { | ||
| let Some(Instruction::BuildTuple { .. }) = instrs[i].instr.real() else { | ||
| continue; | ||
| }; | ||
| let n = u32::from(instrs[i].arg); | ||
| let Some(Instruction::UnpackSequence { .. }) = instrs[i + 1].instr.real() else { |
There was a problem hiding this comment.
Fix spell check failure: rename instrs to instructions.
The CI pipeline fails due to the spell checker not recognizing instrs. This is a blocking issue.
🔧 Proposed fix
fn optimize_build_tuple_unpack(&mut self) {
for block in &mut self.blocks {
- let instrs = &mut block.instructions;
- let len = instrs.len();
+ let instructions = &mut block.instructions;
+ let len = instructions.len();
for i in 0..len.saturating_sub(1) {
- let Some(Instruction::BuildTuple { .. }) = instrs[i].instr.real() else {
+ let Some(Instruction::BuildTuple { .. }) = instructions[i].instr.real() else {
continue;
};
- let n = u32::from(instrs[i].arg);
- let Some(Instruction::UnpackSequence { .. }) = instrs[i + 1].instr.real() else {
+ let n = u32::from(instructions[i].arg);
+ let Some(Instruction::UnpackSequence { .. }) = instructions[i + 1].instr.real() else {
continue;
};
- if u32::from(instrs[i + 1].arg) != n {
+ if u32::from(instructions[i + 1].arg) != n {
continue;
}
match n {
1 => {
- instrs[i].instr = AnyInstruction::Real(Instruction::Nop);
- instrs[i].arg = OpArg::new(0);
- instrs[i + 1].instr = AnyInstruction::Real(Instruction::Nop);
- instrs[i + 1].arg = OpArg::new(0);
+ instructions[i].instr = AnyInstruction::Real(Instruction::Nop);
+ instructions[i].arg = OpArg::new(0);
+ instructions[i + 1].instr = AnyInstruction::Real(Instruction::Nop);
+ instructions[i + 1].arg = OpArg::new(0);
}
2 | 3 => {
- instrs[i].instr = AnyInstruction::Real(Instruction::Nop);
- instrs[i].arg = OpArg::new(0);
- instrs[i + 1].instr =
+ instructions[i].instr = AnyInstruction::Real(Instruction::Nop);
+ instructions[i].arg = OpArg::new(0);
+ instructions[i + 1].instr =
AnyInstruction::Real(Instruction::Swap { i: Arg::marker() });
- instrs[i + 1].arg = OpArg::new(n);
+ instructions[i + 1].arg = OpArg::new(n);
}
_ => {}
}
}
}
}🧰 Tools
🪛 GitHub Actions: CI
[error] 1345-1352: cspell failed (exit code 1). Unknown word(s) 'instrs' reported at lines 1345:17, 1346:23, 1348:60, 1351:35, 1352:64.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/ir.rs` around lines 1345 - 1352, The local variable name
`instrs` (declared as `let instrs = &mut block.instructions;`) must be renamed
to `instructions` to satisfy the spell checker; update the binding and every use
in this scope (e.g., `instrs.len()`, `instrs[i]`, `instrs[i + 1]`) to
`instructions`, keeping the same reference to `block.instructions` and
preserving the surrounding logic that checks `Instruction::BuildTuple` and
`Instruction::UnpackSequence`.
| for block in &mut self.blocks { | ||
| let instrs = &mut block.instructions; | ||
| let len = instrs.len(); |
There was a problem hiding this comment.
Same spell check issue: rename instrs to instructions.
Same issue as in optimize_build_tuple_unpack—the instrs variable name triggers the spell checker.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/ir.rs` around lines 1429 - 1431, Rename the local variable
instrs to a non-spell-checked name (e.g., instructions) to avoid the spell-check
warning: in the loop over self.blocks where you currently bind let instrs = &mut
block.instructions; change that binding to let instructions = &mut
block.instructions; and update all subsequent uses in this scope (same pattern
as in optimize_build_tuple_unpack) so references to instrs are replaced with
instructions.
| for block in &mut self.blocks { | ||
| let instrs = &mut block.instructions; | ||
| let len = instrs.len(); |
There was a problem hiding this comment.
Same spell check issue: rename instrs to instructions.
The instrs variable at line 1515 also triggers the spell checker failure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/codegen/src/ir.rs` around lines 1514 - 1516, Rename the local variable
named `instrs` to `instructions` in the loop that iterates `for block in &mut
self.blocks` (the binding currently `let instrs = &mut block.instructions;`),
and update all references within that loop's scope to use `instructions` instead
of `instrs` so the spell checker no longer flags the abbreviation; leave
`block.instructions` unchanged.
| relpath = os.path.relpath(fpath, lib_dir) | ||
| if pattern and not fnmatch.fnmatch(relpath, pattern): | ||
| continue |
There was a problem hiding this comment.
Normalize relpath before applying --filter.
On Windows, os.path.relpath() returns backslashes, so the documented filter syntax like asyncio/*.py will not match. Converting the relative path to POSIX separators before fnmatch keeps the CLI portable.
Suggested fix
fpath = os.path.join(root, fname)
- relpath = os.path.relpath(fpath, lib_dir)
- if pattern and not fnmatch.fnmatch(relpath, pattern):
+ relpath = os.path.relpath(fpath, lib_dir)
+ match_path = relpath.replace(os.sep, "/")
+ if pattern and not fnmatch.fnmatch(match_path, pattern):
continue
targets.append((relpath, fpath))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| relpath = os.path.relpath(fpath, lib_dir) | |
| if pattern and not fnmatch.fnmatch(relpath, pattern): | |
| continue | |
| relpath = os.path.relpath(fpath, lib_dir) | |
| match_path = relpath.replace(os.sep, "/") | |
| if pattern and not fnmatch.fnmatch(match_path, pattern): | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/compare_bytecode.py` around lines 59 - 61, The relative path used for
filtering (relpath from os.path.relpath(fpath, lib_dir)) may contain Windows
backslashes and thus fail fnmatch.fnmatch(pattern) checks; normalize relpath to
POSIX separators (e.g. use pathlib.Path(relpath).as_posix() or replace
backslashes with '/') before calling fnmatch.fnmatch(relpath, pattern) so
patterns like "asyncio/*.py" match cross-platform.
| def _finish_one(proc, files_file): | ||
| """Wait for a single dis_dump.py process and return parsed JSON.""" | ||
| try: | ||
| stdout = proc.communicate(timeout=600)[0] | ||
| except subprocess.TimeoutExpired: | ||
| proc.kill() | ||
| proc.communicate() | ||
| print(" Timeout (600s)", file=sys.stderr) | ||
| os.unlink(files_file) | ||
| return {} | ||
| finally: | ||
| if os.path.exists(files_file): | ||
| os.unlink(files_file) | ||
|
|
||
| if proc.returncode != 0: | ||
| print(" Warning: exited with code %d" % proc.returncode, file=sys.stderr) | ||
|
|
||
| content = stdout.decode(errors="replace").strip() | ||
| if not content: | ||
| return {} | ||
| try: | ||
| return json.loads(content) | ||
| except json.JSONDecodeError as e: | ||
| print(" JSON parse error: %s" % e, file=sys.stderr) | ||
| return {} |
There was a problem hiding this comment.
Don't let a failed worker disappear from the final result set.
A timeout or JSON parse failure currently makes _finish_one() return {}, and finish_dump() then drops every file assigned to that worker. Those paths never make it into all_files, so the report can overstate coverage, and the current exit check can still return success after RustPython skipped files.
Suggested direction
-def _start_one(interpreter, targets, base_dir):
+def _start_one(interpreter, targets, base_dir):
@@
- return proc, files_file.name
+ return proc, files_file.name, [relpath for relpath, _ in targets]
-def start_dump(interpreter, targets, base_dir, num_workers=1):
+def start_dump(interpreter, targets, base_dir, num_workers=1):
@@
- return [(proc, ff)]
+ return [(proc, ff, [relpath for relpath, _ in targets])]
@@
-def finish_dump(procs):
+def finish_dump(procs):
"""Wait for all dis_dump.py processes and merge results."""
merged = {}
- for proc, files_file in procs:
- merged.update(_finish_one(proc, files_file))
+ for proc, files_file, relpaths in procs:
+ data = _finish_one(proc, files_file)
+ if data:
+ merged.update(data)
+ else:
+ for relpath in relpaths:
+ merged.setdefault(
+ relpath, {"status": "error", "error": "bytecode dump failed"}
+ )
return merged
@@
- sys.exit(0 if differ == 0 and rp_err == 0 else 1)
+ sys.exit(0 if differ == 0 and rp_err == 0 and rp_miss == 0 else 1)Also applies to: 130-148, 361-417, 535-536
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/compare_bytecode.py` around lines 103 - 127, The current
_finish_one() swallows worker failures by returning {} on timeout/parse errors
and unlinking files_file so finish_dump() loses those file paths; change
_finish_one() to preserve the assigned file entries on error by reading the
files list from files_file before unlinking and returning a fallback mapping
(e.g., each filename -> [] or a special failure marker) instead of {}, and
ensure this path is taken for subprocess.TimeoutExpired, non-zero
proc.returncode, and JSONDecodeError cases so finish_dump() receives the
original file set; use the existing files_file and proc symbols to locate and
implement reading the file list and returning the appropriate fallback result.
| sys.exit(1) | ||
|
|
||
| report_path = args.output | ||
| log = lambda *a, **kw: print(*a, file=sys.stderr, **kw) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file
find . -name "compare_bytecode.py" -type fRepository: RustPython/RustPython
Length of output: 94
🏁 Script executed:
# Check if the file exists and get its size
wc -l scripts/compare_bytecode.py 2>/dev/null || echo "File not found in expected location"Repository: RustPython/RustPython
Length of output: 96
🏁 Script executed:
# Look for the lambda assignments at the specified lines
sed -n '340,345p' scripts/compare_bytecode.pyRepository: RustPython/RustPython
Length of output: 269
🏁 Script executed:
# Check line 424
sed -n '422,426p' scripts/compare_bytecode.pyRepository: RustPython/RustPython
Length of output: 261
🏁 Script executed:
# Check for ruff configuration files
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "ruff.toml" -o -name ".ruff.toml" -o -name "setup.cfg" -o -name ".flake8" \)Repository: RustPython/RustPython
Length of output: 86
🏁 Script executed:
# Search for ruff configuration in common files
grep -r "ruff\|E731\|flake8" pyproject.toml setup.cfg 2>/dev/null | head -20Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check ruff.toml
cat ruff.tomlRepository: RustPython/RustPython
Length of output: 1169
🏁 Script executed:
# Check .flake8
cat .flake8Repository: RustPython/RustPython
Length of output: 116
Replace the lambda assignments to comply with E731.
Both lambda assignments at lines 342 and 424 violate the E731 rule in Ruff. Replace them with proper def statements:
Suggested fix
- log = lambda *a, **kw: print(*a, file=sys.stderr, **kw)
+ def log(*a, **kw):
+ print(*a, file=sys.stderr, **kw)
@@
- p = lambda *a: print(*a, file=out)
+ def p(*a):
+ print(*a, file=out)🧰 Tools
🪛 Flake8 (7.3.0)
[error] 342-342: do not assign a lambda expression, use a def
(E731)
🪛 Ruff (0.15.9)
[error] 342-342: Do not assign a lambda expression, use a def
Rewrite log as a def
(E731)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/compare_bytecode.py` at line 342, Replace the inline lambda
assignments with proper function definitions: change the assignment "log =
lambda *a, **kw: print(*a, file=sys.stderr, **kw)" to a def-based implementation
(e.g., def log(*a, **kw): print(*a, file=sys.stderr, **kw)) and do the same for
the other lambda assignment mentioned in the comment (the second lambda at line
424) so both named callables are defined with def instead of lambdas to satisfy
E731; keep the same signature and behavior when converting to the new functions.
- Align except* bytecode chaining - Fix exception state model and finally handler cleanup - Fix RERAISE to only pop exception, preserve values below
- BUILD_TUPLE n + UNPACK_SEQUENCE n elimination - Dead store elimination within basic blocks - apply_static_swaps for SWAP reduction
- compare_bytecode.py: compare CPython vs RustPython bytecode output - dis_dump.py: extract disassembly in normalized JSON format
e5aeba7 to
6824360
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/codegen/src/compile.rs (1)
3958-3980:⚠️ Potential issue | 🟠 MajorMove the re-pushed
SetupFinallyintoelse_block.These instructions are emitted before
switch_to_block(else_block), so they end up incleanup_block, not on thetry-success/elsepath. At runtime,orelsethen executesPopBlockfor aSETUP_FINALLYit never entered, and exceptions raised fromorelsewill bypassfinally.Proposed fix
- if !finalbody.is_empty() { - emit!( - self, - PseudoInstruction::SetupFinally { - delta: finally_block - } - ); - self.push_fblock_full( - FBlockType::FinallyTry, - finally_block, - finally_block, - FBlockDatum::FinallyBody(finalbody.to_vec()), - )?; - } if else_block != continuation_block { self.switch_to_block(else_block); + if !finalbody.is_empty() { + emit!( + self, + PseudoInstruction::SetupFinally { + delta: finally_block + } + ); + self.push_fblock_full( + FBlockType::FinallyTry, + finally_block, + finally_block, + FBlockDatum::FinallyBody(finalbody.to_vec()), + )?; + } self.compile_statements(orelse)?; if !finalbody.is_empty() { // Pop the FinallyTry fblock we just pushed for the else path🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/compile.rs` around lines 3958 - 3980, The SetupFinally emit and push_fblock_full for the finally path are emitted before switch_to_block(else_block), so they land in cleanup_block instead of the else path; move the emit!(..., PseudoInstruction::SetupFinally { delta: finally_block }) and the subsequent self.push_fblock_full(FBlockType::FinallyTry, finally_block, finally_block, FBlockDatum::FinallyBody(finalbody.to_vec())) so they occur after switch_to_block(else_block) (i.e., inside the else branch before compile_statements(orelse)) so the SetupFinally is actually entered on the else/try-success path and the matching emit!(..., PseudoInstruction::PopBlock) and self.pop_fblock(FBlockType::FinallyTry) correspond to an entered finally block.
♻️ Duplicate comments (3)
scripts/compare_bytecode.py (3)
66-98:⚠️ Potential issue | 🟠 MajorDon't let a failed worker drop its entire file set.
When
_finish_one()hits a timeout or JSON error, it returns{}.finish_dump()then loses every file assigned to that worker, so those paths never reachall_files; the report can undercount coverage and still exit successfully after RustPython skipped work. Carry the worker'srelpaths alongside(proc, files_file)and synthesize per-file error entries on timeout, non-zero exit, or JSON parse failure instead of returning an empty mapping.Also applies to: 103-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_bytecode.py` around lines 66 - 98, The worker currently returns an empty mapping on timeout/JSON errors causing finish_dump to drop all files for that worker; modify _start_one to return the relpath list alongside (proc, files_file) (e.g., return (proc, files_file, relpaths)) so callers have the assigned paths, and update _finish_one/finish_dump to, on timeout, non-zero exit code, or JSON parse failure, synthesize per-file error entries for each relpath (marking failure/reason) instead of returning {} and ensure those synthesized entries are merged into all_files so no files are lost from the final report; update any code unpacking the tuple to expect the extra relpaths element.
342-342:⚠️ Potential issue | 🟡 MinorReplace the named lambdas with local
defs.Both assignments still trip Ruff/Flake8 E731. Plain local functions keep the same behavior without failing lint.
Suggested fix
- log = lambda *a, **kw: print(*a, file=sys.stderr, **kw) + def log(*a, **kw): + print(*a, file=sys.stderr, **kw) @@ - p = lambda *a: print(*a, file=out) + def p(*a): + print(*a, file=out)As per coding guidelines, "Use ruff for linting Python code".
Also applies to: 424-424
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_bytecode.py` at line 342, The code assigns a named lambda to a variable (e.g., the lambda assigned to "log" and a similar named-lambda later), which triggers Ruff/Flake8 E731; replace each named lambda with an equivalent local def (for example, define def log(*a, **kw): print(*a, file=sys.stderr, **kw)) so behavior remains identical but lint passes, and update any callers to use the new local function name.
59-60:⚠️ Potential issue | 🟡 MinorNormalize the filter path before calling
fnmatch().Line 59 produces OS-native separators, so on Windows a documented filter like
asyncio/*.pywill not match. Compare against a POSIX-normalized copy and keeprelpathunchanged for reporting.Suggested fix
fpath = os.path.join(root, fname) relpath = os.path.relpath(fpath, lib_dir) - if pattern and not fnmatch.fnmatch(relpath, pattern): + match_path = relpath.replace(os.sep, "/") + if pattern and not fnmatch.fnmatch(match_path, pattern): continue targets.append((relpath, fpath))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_bytecode.py` around lines 59 - 60, The fnmatch check uses OS-native relpath which on Windows uses backslashes, so normalize a POSIX-style copy for matching and keep relpath unchanged for reporting: create a POSIX-normalized variant (e.g., via pathlib.PurePosixPath(relpath).as_posix() or relpath.replace(os.sep, "/")) and call fnmatch.fnmatch(posix_relpath, pattern) instead of fnmatch.fnmatch(relpath, pattern) in the block that contains relpath and pattern.
🧹 Nitpick comments (1)
crates/codegen/src/ir.rs (1)
1543-1543: Useallocor crate-level collections for consistency.This function uses
std::collections::HashMapwhile the rest of the file usesalloc::collections(line 1 importsVecDequefromalloc). This inconsistency could cause issues if the crate needsno_stdcompatibility.Consider using a local
Vecor the crate'sIndexMapinstead, which would also maintain insertion order (though not strictly needed here).♻️ Suggested alternative using a simple Vec scan
if run_end - run_start >= 2 { - // Pass 1: find the LAST occurrence of each variable - let mut last_occurrence = std::collections::HashMap::new(); - for (j, instr) in instrs[run_start..run_end].iter().enumerate() { - last_occurrence.insert(u32::from(instr.arg), j); - } - // Pass 2: non-last stores to the same variable are dead - for (j, instr) in instrs[run_start..run_end].iter_mut().enumerate() { - let idx = u32::from(instr.arg); - if last_occurrence[&idx] != j { + // Pass 1: find which positions are NOT the last occurrence + let run_len = run_end - run_start; + let mut is_dead = vec![false; run_len]; + for j in 0..run_len { + let idx = u32::from(instrs[run_start + j].arg); + // Mark earlier occurrences as dead + for k in 0..j { + if u32::from(instrs[run_start + k].arg) == idx { + is_dead[k] = true; + } + } + } + // Pass 2: replace dead stores with POP_TOP + for (j, &dead) in is_dead.iter().enumerate() { + if dead { + let instr = &mut instrs[run_start + j]; instr.instr = AnyInstruction::Real(Instruction::PopTop); instr.arg = OpArg::new(0); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/codegen/src/ir.rs` at line 1543, Replace the std::collections::HashMap usage for last_occurrence with an alloc-friendly structure: either use a simple Vec<(KeyType, usize)> named last_occurrence and do a small linear search/update where you currently insert/lookup, or switch to a crate-level map like indexmap::IndexMap (or hashbrown::HashMap) if you need faster lookups; update all uses of last_occurrence (creation and lookup/insert sites) in the surrounding function so they operate on the chosen structure to preserve no_std/alloc compatibility and maintain correct semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/compare_bytecode.py`:
- Around line 30-45: find_rustpython() fails on Windows because it only checks
for "rustpython" not "rustpython.exe" and the debug-build guard message is
misleading; change the function to compute executable_name = "rustpython.exe"
when sys.platform == "win32" else "rustpython", use that executable_name when
building debug_fragment (os.path.join("target","debug", executable_name)) for
the RUSTPYTHON debug check and when constructing the release path
(os.path.join(PROJECT_ROOT,"target","release", executable_name)), and update the
ValueError text to reference the platform-specific
target/debug/<executable_name> so both autodiscovery and the debug-build guard
work on Windows.
In `@scripts/dis_dump.py`:
- Around line 178-189: The current _extract_instructions function swallows
exceptions from dis.get_instructions() and returns a synthetic [["ERROR", ...]]
instruction list; change it to let the exception propagate (remove the
try/except or re-raise the caught exception) so process_file's existing
error-handling path can mark the file with status: "error"; apply the same
change to the second identical try/except block referenced around lines 313-323
so all disassembly failures bubble up to process_file instead of producing
synthetic bytecode.
---
Outside diff comments:
In `@crates/codegen/src/compile.rs`:
- Around line 3958-3980: The SetupFinally emit and push_fblock_full for the
finally path are emitted before switch_to_block(else_block), so they land in
cleanup_block instead of the else path; move the emit!(...,
PseudoInstruction::SetupFinally { delta: finally_block }) and the subsequent
self.push_fblock_full(FBlockType::FinallyTry, finally_block, finally_block,
FBlockDatum::FinallyBody(finalbody.to_vec())) so they occur after
switch_to_block(else_block) (i.e., inside the else branch before
compile_statements(orelse)) so the SetupFinally is actually entered on the
else/try-success path and the matching emit!(..., PseudoInstruction::PopBlock)
and self.pop_fblock(FBlockType::FinallyTry) correspond to an entered finally
block.
---
Duplicate comments:
In `@scripts/compare_bytecode.py`:
- Around line 66-98: The worker currently returns an empty mapping on
timeout/JSON errors causing finish_dump to drop all files for that worker;
modify _start_one to return the relpath list alongside (proc, files_file) (e.g.,
return (proc, files_file, relpaths)) so callers have the assigned paths, and
update _finish_one/finish_dump to, on timeout, non-zero exit code, or JSON parse
failure, synthesize per-file error entries for each relpath (marking
failure/reason) instead of returning {} and ensure those synthesized entries are
merged into all_files so no files are lost from the final report; update any
code unpacking the tuple to expect the extra relpaths element.
- Line 342: The code assigns a named lambda to a variable (e.g., the lambda
assigned to "log" and a similar named-lambda later), which triggers Ruff/Flake8
E731; replace each named lambda with an equivalent local def (for example,
define def log(*a, **kw): print(*a, file=sys.stderr, **kw)) so behavior remains
identical but lint passes, and update any callers to use the new local function
name.
- Around line 59-60: The fnmatch check uses OS-native relpath which on Windows
uses backslashes, so normalize a POSIX-style copy for matching and keep relpath
unchanged for reporting: create a POSIX-normalized variant (e.g., via
pathlib.PurePosixPath(relpath).as_posix() or relpath.replace(os.sep, "/")) and
call fnmatch.fnmatch(posix_relpath, pattern) instead of fnmatch.fnmatch(relpath,
pattern) in the block that contains relpath and pattern.
---
Nitpick comments:
In `@crates/codegen/src/ir.rs`:
- Line 1543: Replace the std::collections::HashMap usage for last_occurrence
with an alloc-friendly structure: either use a simple Vec<(KeyType, usize)>
named last_occurrence and do a small linear search/update where you currently
insert/lookup, or switch to a crate-level map like indexmap::IndexMap (or
hashbrown::HashMap) if you need faster lookups; update all uses of
last_occurrence (creation and lookup/insert sites) in the surrounding function
so they operate on the chosen structure to preserve no_std/alloc compatibility
and maintain correct semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 37dcc6ca-ae98-4afc-9d45-5142e130ea84
⛔ Files ignored due to path filters (2)
Lib/test/test_peepholer.pyis excluded by!Lib/**crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/vm/src/frame.rscrates/vm/src/vm/mod.rsscripts/compare_bytecode.pyscripts/dis_dump.py
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/vm/src/vm/mod.rs
| def find_rustpython(): | ||
| """Locate the RustPython binary, allowing release builds only.""" | ||
| if "RUSTPYTHON" in os.environ: | ||
| path = os.environ["RUSTPYTHON"] | ||
| normalized = os.path.normpath(path) | ||
| debug_fragment = os.path.join("target", "debug", "rustpython") | ||
| if normalized.endswith(debug_fragment): | ||
| raise ValueError( | ||
| "RUSTPYTHON must point to a release binary, not target/debug/rustpython" | ||
| ) | ||
| return path | ||
|
|
||
| path = os.path.join(PROJECT_ROOT, "target", "release", "rustpython") | ||
| if os.path.isfile(path) and os.access(path, os.X_OK): | ||
| return path | ||
| return None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, verify the actual code at the specified lines
head -50 scripts/compare_bytecode.py | tail -25Repository: RustPython/RustPython
Length of output: 967
🏁 Script executed:
# Search for Windows platform handling patterns in scripts
rg -i "windows|\.exe|os\.name|sys\.platform" scripts/ -A 2 -B 2Repository: RustPython/RustPython
Length of output: 4746
🏁 Script executed:
# Check if there are other executable discovery patterns in the project
rg "target.*release|target.*debug" . -g "*.py" -g "*.rs" -A 2 -B 2Repository: RustPython/RustPython
Length of output: 7437
Handle the Windows executable name in find_rustpython().
Line 44 only probes target/release/rustpython, and line 37 only rejects target/debug/rustpython. On Windows that misses rustpython.exe, so autodiscovery fails and the debug-build guard also stops working for RUSTPYTHON=...\target\debug\rustpython.exe. Use the platform-specific executable name for both paths, and update the error message accordingly.
For consistency with scripts/update_lib/cmd_auto_mark.py, use sys.platform == "win32" instead of os.name == "nt".
Suggested fix
def find_rustpython():
"""Locate the RustPython binary, allowing release builds only."""
+ exe_name = "rustpython.exe" if sys.platform == "win32" else "rustpython"
if "RUSTPYTHON" in os.environ:
path = os.environ["RUSTPYTHON"]
normalized = os.path.normpath(path)
- debug_fragment = os.path.join("target", "debug", "rustpython")
+ debug_fragment = os.path.join("target", "debug", exe_name)
if normalized.endswith(debug_fragment):
raise ValueError(
- "RUSTPYTHON must point to a release binary, not target/debug/rustpython"
+ f"RUSTPYTHON must point to a release binary, not target/debug/{exe_name}"
)
return path
- path = os.path.join(PROJECT_ROOT, "target", "release", "rustpython")
+ path = os.path.join(PROJECT_ROOT, "target", "release", exe_name)
if os.path.isfile(path) and os.access(path, os.X_OK):
return path
return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/compare_bytecode.py` around lines 30 - 45, find_rustpython() fails on
Windows because it only checks for "rustpython" not "rustpython.exe" and the
debug-build guard message is misleading; change the function to compute
executable_name = "rustpython.exe" when sys.platform == "win32" else
"rustpython", use that executable_name when building debug_fragment
(os.path.join("target","debug", executable_name)) for the RUSTPYTHON debug check
and when constructing the release path
(os.path.join(PROJECT_ROOT,"target","release", executable_name)), and update the
ValueError text to reference the platform-specific
target/debug/<executable_name> so both autodiscovery and the debug-build guard
work on Windows.
| def _extract_instructions(code): | ||
| """Extract normalized instruction list from a code object. | ||
|
|
||
| - Filters out CACHE/PRECALL instructions | ||
| - Converts jump targets from byte offsets to instruction indices | ||
| - Resolves argument names via fallback when argrepr is missing | ||
| - Normalizes argument representations | ||
| """ | ||
| try: | ||
| raw = list(dis.get_instructions(code)) | ||
| except Exception as e: | ||
| return [["ERROR", str(e)]] |
There was a problem hiding this comment.
Bubble disassembly failures into status: "error" instead of synthetic bytecode.
If dis.get_instructions() fails here, the file still comes back from process_file() as "status": "ok" with [["ERROR", ...]] in its instruction stream. scripts/compare_bytecode.py compares those lists verbatim, so extractor failures get reported as bytecode diffs instead of compile/dump errors. Let _extract_instructions() raise and reuse the existing process_file() error path.
Suggested fix
def _extract_instructions(code):
@@
try:
raw = list(dis.get_instructions(code))
except Exception as e:
- return [["ERROR", str(e)]]
+ raise RuntimeError(
+ "failed to disassemble %s" % (getattr(code, "co_qualname", None) or code.co_name)
+ ) from eAlso applies to: 313-323
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 188-188: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/dis_dump.py` around lines 178 - 189, The current
_extract_instructions function swallows exceptions from dis.get_instructions()
and returns a synthetic [["ERROR", ...]] instruction list; change it to let the
exception propagate (remove the try/except or re-raise the caught exception) so
process_file's existing error-handling path can mark the file with status:
"error"; apply the same change to the second identical try/except block
referenced around lines 313-323 so all disassembly failures bubble up to
process_file instead of producing synthetic bytecode.
Summary by CodeRabbit
Bug Fixes
Performance
Tooling
Refactor