Skip to content

bpo-35664: Optimize operator.itemgetter#11435

Merged
rhettinger merged 6 commits intopython:masterfrom
rhettinger:optimize_itemgetter
Jan 7, 2019
Merged

bpo-35664: Optimize operator.itemgetter#11435
rhettinger merged 6 commits intopython:masterfrom
rhettinger:optimize_itemgetter

Conversation

@rhettinger
Copy link
Copy Markdown
Contributor

@rhettinger rhettinger commented Jan 5, 2019

@rhettinger rhettinger added the performance Performance or resource usage label Jan 5, 2019
@rhettinger
Copy link
Copy Markdown
Contributor Author

Timing with patch:

$ ./python.exe -m timeit -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 5: 44.5 nsec per loop

Baseline timing:

$ ./python.exe -m timeit -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 5: 66.2 nsec per loop

@rhettinger
Copy link
Copy Markdown
Contributor Author

All the itemgetter() cases in the standard library use non-negative integer indexes into tuples.

Tools/stringbench/stringbench.py
1436:                                      operator.itemgetter(0)):

Tools/scripts/analyze_dxp.py
92:    result.sort(key=operator.itemgetter(2), reverse=True)
112:    result.sort(key=operator.itemgetter(2), reverse=True)

Lib/lib2to3/tests/test_fixers.py
6:from operator import itemgetter
1841:            a = "import %s" % ", ".join(map(itemgetter(0), changes))

Lib/collections/__init__.py
21:from operator import itemgetter as _itemgetter, eq as _eq
317:    _tuplegetter = lambda index, doc: property(_itemgetter(index), doc=doc)
580:            return sorted(self.items(), key=_itemgetter(1), reverse=True)
581:        return _heapq.nlargest(n, self.items(), key=_itemgetter(1))

Lib/email/_header_value_parser.py
74:from operator import itemgetter
734:            parts = sorted(parts, key=itemgetter(0))

Comment thread Modules/_operator.c
return NULL;
if (nitems == 1)
assert(PyTuple_CheckExact(args));
if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyArg_NoKeywords is a macro, no need to "optimize" it manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this as-is. There is some tiny duplication which gets be compiled away, but to my eyes the kw==NULL check makes it clearer what the common path is and that no external calls are made. Also, I don't trust that _PyArg_NoKeywords will remain a macro.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyArg_NoKeywords() makes clear the purpose of code.

And please add comments why these special cases is checked.

Comment thread Modules/_operator.c Outdated
if (PyLong_CheckExact(item)) {
index = PyLong_AsSsize_t(item);
if (index < 0) {
// Only optimize non-negative values that fit in a Py_ssize_t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add assert(PyErr_ExceptionMatches(PyExc_OverflowError)) just for a case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make sense to me. The OverflowError is only happening when index == -1; otherwise, a negative index just means a negative input.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.

Comment thread Modules/_operator.c Outdated
if (nitems == 1) {
if (ig->index >= 0
&& PyTuple_CheckExact(obj)
&& ig->index < PyTuple_GET_SIZE(obj)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to read such sequence of conditions and statements. PEP 7 recommends to move { to a new line in case of multiline conditions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll move the curly brace to appease PEP 7 even though it looks weird to me.

Comment thread Modules/_operator.c Outdated
if (index < 0) {
// Only optimize non-negative values that fit in a Py_ssize_t
PyErr_Clear();
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PEP 7:

}
else {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@rhettinger
Copy link
Copy Markdown
Contributor Author

---------------- Baseline -------------------------------------
$ git status | head -2
On branch master
Your branch is up to date with 'upstream/master'.
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 65.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-1)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 70.8 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(slice(0,2))' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 92.1 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ["tom", "sue"]' 'g0(t)'
5000000 loops, best of 11: 66.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 'class T(tuple): pass'   -s 't =T(["tom", "sue"])' 'g0(t)'
5000000 loops, best of 11: 65.6 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(2**61)' -s 't=range(2**62)' 'g0(t)'
2000000 loops, best of 11: 181 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-2**61)' -s 't=range(2**62)' 'g0(t)'
1000000 loops, best of 11: 214 nsec per loop
--------------- Patched ------------------------------------------
$ git status | head -1
On branch optimize_itemgetter
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 44.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-1)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 61.1 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(slice(0,2))' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 84.9 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ["tom", "sue"]' 'g0(t)'
5000000 loops, best of 11: 55.3 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 'class T(tuple): pass'   -s 't =T(["tom", "sue"])' 'g0(t)'
5000000 loops, best of 11: 56.6 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(2**61)' -s 't=range(2**62)' 'g0(t)'
2000000 loops, best of 11: 171 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-2**61)' -s 't=range(2**62)' 'g0(t)'
1000000 loops, best of 11: 205 nsec per loop

Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I afraid there is possible an integer overflow for large index (like 2**61 and -2**61 on 64-bit platforms, 2**30 and -2**30 on 32-bit platforms). Please fix it and add tests.

Comment thread Modules/_operator.c Outdated
if (PyLong_CheckExact(item)) {
index = PyLong_AsSsize_t(item);
if (index < 0) {
// Only optimize non-negative values that fit in a Py_ssize_t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.

Comment thread Modules/_operator.c
return NULL;
if (nitems == 1)
assert(PyTuple_CheckExact(args));
if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyArg_NoKeywords() makes clear the purpose of code.

And please add comments why these special cases is checked.

@bedevere-bot
Copy link
Copy Markdown

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@serhiy-storchaka
Copy link
Copy Markdown
Member

Microbenchmark results show significant speed up even for cases that should not be affected by this change. Do you use same build options? Try to use Victor's perf module for more reproducible results.

@rhettinger
Copy link
Copy Markdown
Contributor Author

There are two optimizations, one for argument handling and another that special cases non-negative integer indexing into tuples. The former speeds up all cases (as shown in the benchmarks). The latter speeds-up on the tuple integer index case. It is an all around win.

@rhettinger
Copy link
Copy Markdown
Contributor Author

None of the suggested assertions make sense to me. PyLong_AsSsize_t returns -1 for the error case and sets an error or it returns the converted integer. That is its API -- it doesn't make sense for itemgetter to assert anything about the internals for that function. We're only interested in the case with a non-negative integer that successfully converted. I've updated the comment but don't see a point in doing anything more.

@serhiy-storchaka
Copy link
Copy Markdown
Member

The optimizations for argument handling can not be the cause of the difference in above results, because it is applied to the code that is executed only at setup.

Could you perform benchmarking of the itemgetter() constructor to show that this optimization have any effect?

@rhettinger
Copy link
Copy Markdown
Contributor Author

The optimizations for argument handling can not be the cause of the difference in above results, because it is applied to the code that is executed only at setup.

That is incorrect. These lines in itemgetter_call() are responsible for the speed-up for all cases:

    if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {
        obj = PyTuple_GET_ITEM(args, 0);
    }

I think what you're missing is that args is always a tuple and the above fragment always gets executed unless there is an error. It is the obj variable that might or might not be a tuple (though it typically is).

@serhiy-storchaka
Copy link
Copy Markdown
Member

serhiy-storchaka commented Jan 6, 2019

Ah, you are right. Then this optimization LGTM. Just fix an integer overflow.

Take also a look at bpo-35582. That issue adds a private helper _PyArg_CheckPositional which could be used instead of PyArg_UnpackTuple for keeping the code simple and efficient after converting it to a macro with fast path, like _PyArg_NoKeywords.

@rhettinger
Copy link
Copy Markdown
Contributor Author

Just fix an integer overflow.

Where? I don't see it. The range check is that same as that used for PyTuple_GetItem() and tuple sizes are already checked for overflow in PyTuple_New.

Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right again. Sorry for disturbs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants