bpo-35664: Optimize operator.itemgetter by rhettinger · Pull Request #11435 · python/cpython

rhettinger · 2019-01-05T05:20:33Z

https://bugs.python.org/issue35664

rhettinger · 2019-01-05T05:47:02Z

Timing with patch:

$ ./python.exe -m timeit -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 5: 44.5 nsec per loop

Baseline timing:

$ ./python.exe -m timeit -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 5: 66.2 nsec per loop

rhettinger · 2019-01-05T06:04:28Z

All the itemgetter() cases in the standard library use non-negative integer indexes into tuples.

Tools/stringbench/stringbench.py
1436:                                      operator.itemgetter(0)):

Tools/scripts/analyze_dxp.py
92:    result.sort(key=operator.itemgetter(2), reverse=True)
112:    result.sort(key=operator.itemgetter(2), reverse=True)

Lib/lib2to3/tests/test_fixers.py
6:from operator import itemgetter
1841:            a = "import %s" % ", ".join(map(itemgetter(0), changes))

Lib/collections/__init__.py
21:from operator import itemgetter as _itemgetter, eq as _eq
317:    _tuplegetter = lambda index, doc: property(_itemgetter(index), doc=doc)
580:            return sorted(self.items(), key=_itemgetter(1), reverse=True)
581:        return _heapq.nlargest(n, self.items(), key=_itemgetter(1))

Lib/email/_header_value_parser.py
74:from operator import itemgetter
734:            parts = sorted(parts, key=itemgetter(0))

serhiy-storchaka · 2019-01-05T07:40:56Z

-        return NULL;
-    if (nitems == 1)
+    assert(PyTuple_CheckExact(args));
+    if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {


_PyArg_NoKeywords is a macro, no need to "optimize" it manually.

I prefer this as-is. There is some tiny duplication which gets be compiled away, but to my eyes the kw==NULL check makes it clearer what the common path is and that no external calls are made. Also, I don't trust that _PyArg_NoKeywords will remain a macro.

_PyArg_NoKeywords() makes clear the purpose of code.

And please add comments why these special cases is checked.

serhiy-storchaka · 2019-01-05T07:44:01Z

+    if (PyLong_CheckExact(item)) {
+        index = PyLong_AsSsize_t(item);
+        if (index < 0) {
+            // Only optimize non-negative values that fit in a Py_ssize_t


Maybe add assert(PyErr_ExceptionMatches(PyExc_OverflowError)) just for a case?

That doesn't make sense to me. The OverflowError is only happening when index == -1; otherwise, a negative index just means a negative input.

Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.

serhiy-storchaka · 2019-01-05T07:46:13Z

+    if (nitems == 1) {
+        if (ig->index >= 0
+            && PyTuple_CheckExact(obj)
+            && ig->index < PyTuple_GET_SIZE(obj)) {


It is hard to read such sequence of conditions and statements. PEP 7 recommends to move { to a new line in case of multiline conditions.

Okay, I'll move the curly brace to appease PEP 7 even though it looks weird to me.

serhiy-storchaka · 2019-01-05T07:47:13Z

+        if (index < 0) {
+            // Only optimize non-negative values that fit in a Py_ssize_t
+            PyErr_Clear();
+        } else {


For PEP 7:

} else {

rhettinger · 2019-01-05T17:06:48Z

---------------- Baseline -------------------------------------
$ git status | head -2
On branch master
Your branch is up to date with 'upstream/master'.
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 65.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-1)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 70.8 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(slice(0,2))' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 92.1 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ["tom", "sue"]' 'g0(t)'
5000000 loops, best of 11: 66.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 'class T(tuple): pass'   -s 't =T(["tom", "sue"])' 'g0(t)'
5000000 loops, best of 11: 65.6 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(2**61)' -s 't=range(2**62)' 'g0(t)'
2000000 loops, best of 11: 181 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-2**61)' -s 't=range(2**62)' 'g0(t)'
1000000 loops, best of 11: 214 nsec per loop
--------------- Patched ------------------------------------------
$ git status | head -1
On branch optimize_itemgetter
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 44.5 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-1)' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 61.1 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(slice(0,2))' -s 't = ("tom", "sue")' 'g0(t)'
5000000 loops, best of 11: 84.9 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 't = ["tom", "sue"]' 'g0(t)'
5000000 loops, best of 11: 55.3 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(0)' -s 'class T(tuple): pass'   -s 't =T(["tom", "sue"])' 'g0(t)'
5000000 loops, best of 11: 56.6 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(2**61)' -s 't=range(2**62)' 'g0(t)'
2000000 loops, best of 11: 171 nsec per loop
$ ./python.exe -m timeit -r11 -s 'from operator import itemgetter' -s 'g0 = itemgetter(-2**61)' -s 't=range(2**62)' 'g0(t)'
1000000 loops, best of 11: 205 nsec per loop

serhiy-storchaka

I afraid there is possible an integer overflow for large index (like 2**61 and -2**61 on 64-bit platforms, 2**30 and -2**30 on 32-bit platforms). Please fix it and add tests.

serhiy-storchaka · 2019-01-06T06:56:14Z

+    if (PyLong_CheckExact(item)) {
+        index = PyLong_AsSsize_t(item);
+        if (index < 0) {
+            // Only optimize non-negative values that fit in a Py_ssize_t


Oh, right. The correct assertion should be assert(!PyErr_Occurred() || PyErr_ExceptionMatches(PyExc_OverflowError)). This will serve documenting purposes, showing that the only possible exception is OverflowError.

serhiy-storchaka · 2019-01-06T06:58:36Z

-        return NULL;
-    if (nitems == 1)
+    assert(PyTuple_CheckExact(args));
+    if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {


_PyArg_NoKeywords() makes clear the purpose of code.

And please add comments why these special cases is checked.

bedevere-bot · 2019-01-06T07:02:14Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

serhiy-storchaka · 2019-01-06T07:05:48Z

Microbenchmark results show significant speed up even for cases that should not be affected by this change. Do you use same build options? Try to use Victor's perf module for more reproducible results.

rhettinger · 2019-01-06T07:12:50Z

There are two optimizations, one for argument handling and another that special cases non-negative integer indexing into tuples. The former speeds up all cases (as shown in the benchmarks). The latter speeds-up on the tuple integer index case. It is an all around win.

rhettinger · 2019-01-06T07:44:16Z

None of the suggested assertions make sense to me. PyLong_AsSsize_t returns -1 for the error case and sets an error or it returns the converted integer. That is its API -- it doesn't make sense for itemgetter to assert anything about the internals for that function. We're only interested in the case with a non-negative integer that successfully converted. I've updated the comment but don't see a point in doing anything more.

serhiy-storchaka · 2019-01-06T09:18:06Z

The optimizations for argument handling can not be the cause of the difference in above results, because it is applied to the code that is executed only at setup.

Could you perform benchmarking of the itemgetter() constructor to show that this optimization have any effect?

rhettinger · 2019-01-06T17:40:07Z

The optimizations for argument handling can not be the cause of the difference in above results, because it is applied to the code that is executed only at setup.

That is incorrect. These lines in itemgetter_call() are responsible for the speed-up for all cases:

    if (kw == NULL && PyTuple_GET_SIZE(args) == 1) {
        obj = PyTuple_GET_ITEM(args, 0);
    }

I think what you're missing is that args is always a tuple and the above fragment always gets executed unless there is an error. It is the obj variable that might or might not be a tuple (though it typically is).

serhiy-storchaka · 2019-01-06T18:28:52Z

Ah, you are right. Then this optimization LGTM. Just fix an integer overflow.

Take also a look at bpo-35582. That issue adds a private helper _PyArg_CheckPositional which could be used instead of PyArg_UnpackTuple for keeping the code simple and efficient after converting it to a macro with fast path, like _PyArg_NoKeywords.

rhettinger · 2019-01-06T22:49:25Z

Just fix an integer overflow.

Where? I don't see it. The range check is that same as that used for PyTuple_GetItem() and tuple sizes are already checked for overflow in PyTuple_New.

serhiy-storchaka

You are right again. Sorry for disturbs.

rhettinger added 3 commits January 4, 2019 13:44

Operator argument parsing for the normal case

9f1e266

Fast path for tuples with non-negative integer indicies

c014a05

Add blurb

d00fd55

rhettinger added the performance Performance or resource usage label Jan 5, 2019

the-knights-who-say-ni added the CLA signed label Jan 5, 2019

bedevere-bot added the awaiting merge label Jan 5, 2019

rhettinger requested a review from serhiy-storchaka January 5, 2019 07:22

serhiy-storchaka reviewed Jan 5, 2019

View reviewed changes

PEP 7 nits

e9e2fa9

serhiy-storchaka requested changes Jan 6, 2019

View reviewed changes

bedevere-bot removed the awaiting merge label Jan 6, 2019

bedevere-bot added the awaiting changes label Jan 6, 2019

Add more tests

e7c33f1

Expand comment since the review didn't understand previous comment

d70baef

serhiy-storchaka approved these changes Jan 7, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Jan 7, 2019

rhettinger merged commit 2d53bed into python:master Jan 7, 2019

bedevere-bot removed the awaiting merge label Jan 7, 2019

serhiy-storchaka mentioned this pull request Jan 11, 2019

bpo-35582: Inline arguments tuple unpacking in handwritten code. #11524

Merged

Uh oh!

Conversation

rhettinger commented Jan 5, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Jan 5, 2019

Uh oh!

rhettinger commented Jan 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhettinger commented Jan 5, 2019

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Jan 6, 2019

Uh oh!

serhiy-storchaka commented Jan 6, 2019

Uh oh!

rhettinger commented Jan 6, 2019

Uh oh!

rhettinger commented Jan 6, 2019

Uh oh!

serhiy-storchaka commented Jan 6, 2019

Uh oh!

rhettinger commented Jan 6, 2019

Uh oh!

serhiy-storchaka commented Jan 6, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Jan 6, 2019

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rhettinger commented Jan 5, 2019 •

edited by bedevere-bot

Loading

serhiy-storchaka commented Jan 6, 2019 •

edited by bedevere-bot

Loading