bpo-35664: Optimize operator.itemgetter#11435
Conversation
|
Timing with patch:
Baseline timing:
|
|
All the itemgetter() cases in the standard library use non-negative integer indexes into tuples. |
| return NULL; | ||
| if (nitems == 1) | ||
| assert(PyTuple_CheckExact(args)); | ||
| if (kw == NULL && PyTuple_GET_SIZE(args) == 1) { |
There was a problem hiding this comment.
_PyArg_NoKeywords is a macro, no need to "optimize" it manually.
There was a problem hiding this comment.
I prefer this as-is. There is some tiny duplication which gets be compiled away, but to my eyes the kw==NULL check makes it clearer what the common path is and that no external calls are made. Also, I don't trust that _PyArg_NoKeywords will remain a macro.
There was a problem hiding this comment.
_PyArg_NoKeywords() makes clear the purpose of code.
And please add comments why these special cases is checked.
| if (PyLong_CheckExact(item)) { | ||
| index = PyLong_AsSsize_t(item); | ||
| if (index < 0) { | ||
| // Only optimize non-negative values that fit in a Py_ssize_t |
There was a problem hiding this comment.
Maybe add assert(PyErr_ExceptionMatches(PyExc_OverflowError)) just for a case?
There was a problem hiding this comment.
That doesn't make sense to me. The OverflowError is only happening when index == -1; otherwise, a negative index just means a negative input.
There was a problem hiding this comment.
Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.
| if (nitems == 1) { | ||
| if (ig->index >= 0 | ||
| && PyTuple_CheckExact(obj) | ||
| && ig->index < PyTuple_GET_SIZE(obj)) { |
There was a problem hiding this comment.
It is hard to read such sequence of conditions and statements. PEP 7 recommends to move { to a new line in case of multiline conditions.
There was a problem hiding this comment.
Okay, I'll move the curly brace to appease PEP 7 even though it looks weird to me.
| if (index < 0) { | ||
| // Only optimize non-negative values that fit in a Py_ssize_t | ||
| PyErr_Clear(); | ||
| } else { |
|
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I afraid there is possible an integer overflow for large index (like 2**61 and -2**61 on 64-bit platforms, 2**30 and -2**30 on 32-bit platforms). Please fix it and add tests.
| if (PyLong_CheckExact(item)) { | ||
| index = PyLong_AsSsize_t(item); | ||
| if (index < 0) { | ||
| // Only optimize non-negative values that fit in a Py_ssize_t |
There was a problem hiding this comment.
Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.
| return NULL; | ||
| if (nitems == 1) | ||
| assert(PyTuple_CheckExact(args)); | ||
| if (kw == NULL && PyTuple_GET_SIZE(args) == 1) { |
There was a problem hiding this comment.
_PyArg_NoKeywords() makes clear the purpose of code.
And please add comments why these special cases is checked.
|
When you're done making the requested changes, leave the comment: |
|
Microbenchmark results show significant speed up even for cases that should not be affected by this change. Do you use same build options? Try to use Victor's perf module for more reproducible results. |
|
There are two optimizations, one for argument handling and another that special cases non-negative integer indexing into tuples. The former speeds up all cases (as shown in the benchmarks). The latter speeds-up on the tuple integer index case. It is an all around win. |
|
None of the suggested assertions make sense to me. |
|
The optimizations for argument handling can not be the cause of the difference in above results, because it is applied to the code that is executed only at setup. Could you perform benchmarking of the |
That is incorrect. These lines in I think what you're missing is that args is always a tuple and the above fragment always gets executed unless there is an error. It is the obj variable that might or might not be a tuple (though it typically is). |
|
Ah, you are right. Then this optimization LGTM. Just fix an integer overflow. Take also a look at bpo-35582. That issue adds a private helper |
Where? I don't see it. The range check is that same as that used for |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
You are right again. Sorry for disturbs.
https://bugs.python.org/issue35664