bpo-37348: optimize PyUnicode_FromString#14273
bpo-37348: optimize PyUnicode_FromString#14273methane wants to merge 2 commits intopython:masterfrom
Conversation
vstinner
left a comment
There was a problem hiding this comment.
I don't understand why this change is needed (why it makes the code faster): unicode_decode_utf8() already starts by decoding from ASCII. Is it a performance issue in ascii_decode()? Does _PyUnicodeWriter adds a small overhead which is significant here?
cc @serhiy-storchaka who knows well such hardcores micro-optimizations ;-)
|
|
||
| while (u[len] != '\0') { | ||
| if (u[len] > 127) { | ||
| is_ascii = 0; |
There was a problem hiding this comment.
Can't you reuse fast _Py_bytes_isascii() somehow here? It works on unsigned long words (64 bits) rather than working on bytes (8 bits): it should be 8x faster.
| if (is_ascii) { | ||
| return _PyUnicode_FromASCII(u, (Py_ssize_t)len); | ||
| } | ||
| return PyUnicode_DecodeUTF8Stateful(u, (Py_ssize_t)len, NULL, NULL); |
There was a problem hiding this comment.
Why not modifying PyUnicode_DecodeUTF8Stateful() to detect if the input string is ASCII, or your heuristic is faster? Is it because it make non-ASCII string decoding slower?
https://bugs.python.org/issue37348