Macro for defining opcode & instruction enums#7573
Macro for defining opcode & instruction enums#7573youknowone merged 23 commits intoRustPython:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces handwritten Changes
Sequence Diagram(s)(Skipped — changes are internal enum/macro generation without multi-component sequential runtime flow.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } | ||
|
|
||
| #[derive(Clone, Copy, Debug)] | ||
| #[repr($typ)] |
There was a problem hiding this comment.
I wanted to remove this #[repr(...)], but when doing so I get the following error:
thread 'main' (3429) panicked at crates/vm/src/frame.rs:3790:33: index out of bounds: the len is 5 but the index is 22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
RustPython/crates/vm/src/frame.rs
Lines 3787 to 3791 in d5a90e5
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode/instruction.rs (1)
32-35: Consider derivingPartialEqandEqfor the opcode enum.The
OpcodeandPseudoOpcodeenums only deriveClone,Copy, andDebug. AddingPartialEqandEqwould enable equality comparisons and pattern matching inmatchguards, which may be useful for consumers of these types.💡 Suggested change
- #[derive(Clone, Copy, Debug)] + #[derive(Clone, Copy, Debug, PartialEq, Eq)] pub enum $opcode_name { $($op_name),* }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 32 - 35, The opcode enums only derive Clone, Copy, and Debug; add PartialEq and Eq to the derive list so consumers can compare values and use match guards. Update the macro-generated enum declarations (the $opcode_name expansion and the concrete Opcode and PseudoOpcode enums) to include #[derive(Clone, Copy, Debug, PartialEq, Eq)] so equality comparisons are available for those types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 1303-1308: The StoreFastMaybeNull instruction currently declares
its oparg var_num as NameIdx but should use the local variable index type;
change the oparg type for StoreFastMaybeNull from NameIdx to oparg::VarNum so it
matches the local-store semantics (same as StoreFast) and uses the correct
symbol names var_num and oparg::VarNum in the instruction definition.
---
Nitpick comments:
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 32-35: The opcode enums only derive Clone, Copy, and Debug; add
PartialEq and Eq to the derive list so consumers can compare values and use
match guards. Update the macro-generated enum declarations (the $opcode_name
expansion and the concrete Opcode and PseudoOpcode enums) to include
#[derive(Clone, Copy, Debug, PartialEq, Eq)] so equality comparisons are
available for those types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 490f37ff-9d9d-4813-9ac5-a6ddc00d643f
📒 Files selected for processing (2)
crates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/instruction.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode/instruction.rs (1)
32-35: Consider addingPartialEq,Eq, andHashderives to opcode enums.The generated opcode enums (
Opcode,PseudoOpcode) only deriveClone, Copy, Debug. AddingPartialEq,Eq, and optionallyHashwould enable their use in comparisons and as map/set keys, which could be useful for opcode analysis or caching scenarios.♻️ Suggested derive expansion
- #[derive(Clone, Copy, Debug)] + #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] pub enum $opcode_name {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 32 - 35, The generated opcode enums (the template enum named by $opcode_name, e.g., Opcode and PseudoOpcode) only derive Clone, Copy, Debug; update the derive attribute to also include PartialEq, Eq, and Hash so these enums can be compared and used as map/set keys; modify the derive line in the enum generation template (the block containing #[derive(Clone, Copy, Debug)] and pub enum $opcode_name { ... }) to add PartialEq, Eq, Hash, and ensure both generated enums are adjusted accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 32-35: The generated opcode enums (the template enum named by
$opcode_name, e.g., Opcode and PseudoOpcode) only derive Clone, Copy, Debug;
update the derive attribute to also include PartialEq, Eq, and Hash so these
enums can be compared and used as map/set keys; modify the derive line in the
enum generation template (the block containing #[derive(Clone, Copy, Debug)] and
pub enum $opcode_name { ... }) to add PartialEq, Eq, Hash, and ensure both
generated enums are adjusted accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 2966abcb-4020-4bbc-ba78-d346734c5fff
📒 Files selected for processing (1)
crates/compiler-core/src/bytecode/instruction.rs
|
ping @youknowone |
|
This PR is splitting Instruction and Opcode, right? I wasn't sure what's the benefit based on this changes. Could you tell me more? |
yes, exactly. If this approach is good in your opinion I'll make the rest of the code to use the |
|
@ShaharNaveh The change itself is negative by increasing complexity. It will be justfied if the follow-up changes are beneficial enough. Could you tell me how the seperated type makes the codebase better? That will help me to understand this change better. |
Sure! The main benefit would be the less code repetition for things like RustPython/crates/codegen/src/ir.rs Lines 452 to 471 in 2b1b0ba Where we would right it as: let opcode = op.opcode();
let new_opcode = match opcode {
Opcode::JumpForward if target_offset <= current_offset => Opcode::JumpBackward,
Opcode::JumpBackward if target_offset > current_offset => Opcode::JumpForward,
Opcode::JumpBackwardNoInterrupt if target_offset > current_offset => Opcode::JumpForward,
_ => opcode
};
op = new_opcode.into();And we have lots of places where an enum like: enum AnyOpcode {
Real(Opcode),
Pseudo(PseudoOpcode)
}Can reduce lots of boilerplate examples: RustPython/crates/codegen/src/ir.rs Lines 2740 to 2749 in 2b1b0ba RustPython/crates/codegen/src/ir.rs Lines 2766 to 2780 in 2b1b0ba RustPython/crates/codegen/src/ir.rs Lines 2854 to 2862 in 2b1b0ba By having a separate |
youknowone
left a comment
There was a problem hiding this comment.
Thank you for your patient. I agree about benefits. Let's try the macro don't look like an alien too much. Then it will be perfect
| define_opcodes!( | ||
| opcode_enum: Opcode, | ||
| instruction_enum: Instruction, | ||
| typ: u8, |
There was a problem hiding this comment.
for better text searchability, I prefer to keep macros as much as original syntax.
| typ: u8, | |
| define_opcodes!( | |
| #[repr(u8)] | |
| enum Opcode; | |
| enum Instruction; | |
| ) |
This will help rg 'enum Opcode' can work as normal enum.
If possible, keeping the original enum syntax as much as possible will be even better:
define_opcodes!(
#[repr(u8)]
enum Opcode;
enum Instruction {
Cache = 0,
...,
BuildInterpolation { format: Arg<u32> } = 45,
...,
}
)There was a problem hiding this comment.
How would I handle the display part with this syntax? I'd rather to not use a 3rd party crate like "paste" to do the conversion to: "BUILD_INTERPOLATION" for example. maybe we can drop that and have an impl Display for Opcode manually?
There was a problem hiding this comment.
ah, forgot about it. It doesn't necessarily need to be real code. But making it close enough to real code helps to search and read code. I don't remember details, but macro could handle doc-comment. Then it will be most close to real code. Otherwise adding a string end of the line will be just fine. Let's just try enough to make this easier to read for rust programmers
d657b74 to
93c38d8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
scripts/generate_opcode_metadata.py (2)
29-31: Minor naming consideration: function produces UPPER_SNAKE_CASE.The function name
to_snake_casetypically implieslower_case_with_underscores, but this producesUPPER_CASE_WITH_UNDERSCORES. Consider renaming toto_screaming_snake_caseorto_upper_snake_casefor clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate_opcode_metadata.py` around lines 29 - 31, The function to_snake_case actually returns UPPER_SNAKE_CASE (screaming snake case); rename it to something clear like to_screaming_snake_case or to_upper_snake_case and update all call sites accordingly (search for to_snake_case references) and adjust any docstrings/comments to reflect screaming/upper snake case behavior to avoid confusion.
52-60: Consider adding defensive checks for regex matches.Lines 53-54 call
.group()and.groups()directly on regex results. If an entry doesn't match the expected pattern (e.g., due to format changes ininstruction.rs), this will raise anAttributeError: 'NoneType' object has no attribute 'group', which can be confusing to debug.💡 Suggested improvement for clearer error messages
have_oparg = "Arg<" in entry # Hacky but works - rust_name = re.match(r"(\w+)", entry).group(1) - id_num, cpython_name = re.search(r'\((\d+),\s*"([^"]+)"\)', entry).groups() + name_match = re.match(r"(\w+)", entry) + if not name_match: + raise ValueError(f"Could not parse Rust name from entry: {entry!r}") + rust_name = name_match.group(1) + + id_match = re.search(r'\((\d+),\s*"([^"]+)"\)', entry) + if not id_match: + raise ValueError(f"Could not parse (id, cpython_name) from entry: {entry!r}") + id_num, cpython_name = id_match.groups() yield cls(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate_opcode_metadata.py` around lines 52 - 60, Add defensive checks around the regex matches for rust_name and (id_num, cpython_name) so a malformed entry does not cause a cryptic AttributeError: verify re.match(r"(\w+)", entry) and re.search(r'\((\d+),\s*"([^"]+)"\)', entry) returned non-None before calling .group()/.groups(), and if either is None raise a clear ValueError that includes the offending entry string (and optional context like the index or source filename) so callers of the generator (the class constructor via cls(...), and variables have_oparg / rust_name) get a descriptive error instead of NoneType attribute errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/generate_opcode_metadata.py`:
- Around line 69-85: The extract_enum_body function currently returns None when
the enum isn't found or when no closing brace is located; change this to raise a
clear exception (e.g., ValueError or RuntimeError) that includes the enum name
so callers (like where the result is joined into a string) never receive None.
Specifically, in extract_enum_body replace the early "return None" when
start_match is falsy with a raised exception specifying the missing enum name,
and after the loop add a raised exception for an unclosed enum block; keep the
return of the inner content when a matching closing brace is found and ensure
the function signature and docstring reflect that it either returns str or
raises.
---
Nitpick comments:
In `@scripts/generate_opcode_metadata.py`:
- Around line 29-31: The function to_snake_case actually returns
UPPER_SNAKE_CASE (screaming snake case); rename it to something clear like
to_screaming_snake_case or to_upper_snake_case and update all call sites
accordingly (search for to_snake_case references) and adjust any
docstrings/comments to reflect screaming/upper snake case behavior to avoid
confusion.
- Around line 52-60: Add defensive checks around the regex matches for rust_name
and (id_num, cpython_name) so a malformed entry does not cause a cryptic
AttributeError: verify re.match(r"(\w+)", entry) and
re.search(r'\((\d+),\s*"([^"]+)"\)', entry) returned non-None before calling
.group()/.groups(), and if either is None raise a clear ValueError that includes
the offending entry string (and optional context like the index or source
filename) so callers of the generator (the class constructor via cls(...), and
variables have_oparg / rust_name) get a descriptive error instead of NoneType
attribute errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: e118bf35-54e6-417d-9509-47c0fe076d7d
📒 Files selected for processing (1)
scripts/generate_opcode_metadata.py
| def extract_enum_body(text: str, name: str) -> str: | ||
| # Find the start of the enum block | ||
| start_match = re.search(rf"enum\s+{name}\s*\{{", text) | ||
| if not start_match: | ||
| return None | ||
|
|
||
| return "\n".join( | ||
| line.split("//")[0].strip() # Remove any comment. i.e. "foo // some comment" | ||
| for line in res.group(1).splitlines() | ||
| if not line.strip().startswith("//") # Ignore comment lines | ||
| ) | ||
| # Manually track brace depth from that point | ||
| depth = 0 | ||
| start = start_match.end() - 1 # position of opening '{' | ||
| for i, ch in enumerate(text[start:], start): | ||
| if ch == "{": | ||
| depth += 1 | ||
| elif ch == "}": | ||
| depth -= 1 | ||
| if depth == 0: | ||
| # Return only the inner content (excluding outer braces) | ||
| return text[start + 1 : i] |
There was a problem hiding this comment.
Function can return None but callers don't handle it.
The function returns None explicitly on line 73 (enum not found) and implicitly if no closing brace is found (loop completes without return). However, the caller at lines 138-141 uses this in "\n".join(...), which will raise a TypeError with an unhelpful message if None is returned.
💡 Suggested improvement: raise explicit error instead of returning None
def extract_enum_body(text: str, name: str) -> str:
# Find the start of the enum block
start_match = re.search(rf"enum\s+{name}\s*\{{", text)
if not start_match:
- return None
+ raise ValueError(f"Could not find 'enum {name}' in source file")
# Manually track brace depth from that point
depth = 0
start = start_match.end() - 1 # position of opening '{'
for i, ch in enumerate(text[start:], start):
if ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
# Return only the inner content (excluding outer braces)
return text[start + 1 : i]
+
+ raise ValueError(f"Could not find matching closing brace for 'enum {name}'")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/generate_opcode_metadata.py` around lines 69 - 85, The
extract_enum_body function currently returns None when the enum isn't found or
when no closing brace is located; change this to raise a clear exception (e.g.,
ValueError or RuntimeError) that includes the enum name so callers (like where
the result is joined into a string) never receive None. Specifically, in
extract_enum_body replace the early "return None" when start_match is falsy with
a raised exception specifying the missing enum name, and after the loop add a
raised exception for an unclosed enum block; keep the return of the inner
content when a matching closing brace is found and ensure the function signature
and docstring reflect that it either returns str or raises.
There was a problem hiding this comment.
it's okay, we want to crash in case of returned "None" here
There was a problem hiding this comment.
@ShaharNaveh, understood! That's a reasonable approach for a utility script — if the enum isn't found, a crash is the desired behavior and the TypeError from "\n".join(None, ...) will surface it. Thanks for the clarification!
✏️ Learnings added
Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 7573
File: scripts/generate_opcode_metadata.py:69-85
Timestamp: 2026-04-15T08:58:35.342Z
Learning: In `scripts/generate_opcode_metadata.py` (RustPython repo), `extract_enum_body` intentionally returns `None` when the enum is not found or the closing brace is missing. The author wants the caller to crash with a `TypeError` in that case, rather than raising an explicit exception. Do not suggest converting `return None` to `raise ValueError(...)` in this function.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
youknowone
left a comment
There was a problem hiding this comment.
👍 Looks great! Thank you so much!
This macro will generate an
Instructionand aOpcodeenum, all within rust, no external python scripts with custom config in toml or something.I'd love to hear what you guys think:)
Summary by CodeRabbit
Refactor
New Features