Right now the implementation of Message.__contains__ looks like this:
|
def __contains__(self, name): |
|
return name.lower() in [k.lower() for k, v in self._headers] |
There are several problems here:
- We build intermediate structure (
list in this case)
- We use
list for in operation, which is slow
The fastest way to do check if actually have this item is simply by:
def __contains__(self, name):
name_lower = name.lower()
for k, v in self._headers:
if name_lower == k.lower():
return True
return False
We do not create any intermediate lists / sets. And we even don't iterate longer than needed.
This change makes in check twice as fast.
Microbenchmark
Before
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.40 us +- 0.14 us
pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.42 us +- 0.06 us
After
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 904 ns +- 55 ns
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 715 ns +- 24 ns
The second case is now twice as fast.
It probably also consumes less memory now, but I don't think it is very significant.
Importance
Since EmailMessage (a subclass of Message) is quite widely used by users and 3rd party libs, I think it is important to be included.
And since the patch is quite simple and pure-python, I think the risks are very low.
Linked PRs
Right now the implementation of
Message.__contains__looks like this:cpython/Lib/email/message.py
Lines 450 to 451 in 2f2fa03
There are several problems here:
listin this case)listforinoperation, which is slowThe fastest way to do check if actually have this item is simply by:
We do not create any intermediate lists / sets. And we even don't iterate longer than needed.
This change makes
incheck twice as fast.Microbenchmark
Before
After
The second case is now twice as fast.
It probably also consumes less memory now, but I don't think it is very significant.
Importance
Since
EmailMessage(a subclass ofMessage) is quite widely used by users and 3rd party libs, I think it is important to be included.And since the patch is quite simple and pure-python, I think the risks are very low.
Linked PRs
email.message.Message.__contains__twice as fast #100793