bpo-37348: optimize PyUnicode_FromString by methane · Pull Request #14273 · python/cpython

methane · 2019-06-20T16:00:32Z

https://bugs.python.org/issue37348

vstinner

I don't understand why this change is needed (why it makes the code faster): unicode_decode_utf8() already starts by decoding from ASCII. Is it a performance issue in ascii_decode()? Does _PyUnicodeWriter adds a small overhead which is significant here?

cc @serhiy-storchaka who knows well such hardcores micro-optimizations ;-)

vstinner · 2019-06-20T16:15:23Z

+
+    while (u[len] != '\0') {
+        if (u[len] > 127) {
+            is_ascii = 0;


Can't you reuse fast _Py_bytes_isascii() somehow here? It works on unsigned long words (64 bits) rather than working on bytes (8 bits): it should be 8x faster.

vstinner · 2019-06-20T16:16:10Z

+    if (is_ascii) {
+        return _PyUnicode_FromASCII(u, (Py_ssize_t)len);
+    }
+    return PyUnicode_DecodeUTF8Stateful(u, (Py_ssize_t)len, NULL, NULL);


Why not modifying PyUnicode_DecodeUTF8Stateful() to detect if the input string is ASCII, or your heuristic is faster? Is it because it make non-ASCII string decoding slower?

methane added 2 commits June 21, 2019 00:57

bpo-37348: optimize PyUnicode_FromString for ASCII

ba4f73b

add news fragment

240310c

the-knights-who-say-ni added the CLA signed label Jun 20, 2019

bedevere-bot added the awaiting core review label Jun 20, 2019

vstinner reviewed Jun 20, 2019

View reviewed changes

methane closed this Jun 21, 2019

methane deleted the ascii-fromstring branch June 21, 2019 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-37348: optimize PyUnicode_FromString#14273

bpo-37348: optimize PyUnicode_FromString#14273
methane wants to merge 2 commits intopython:masterfrom
methane:ascii-fromstring

methane commented Jun 20, 2019 •

edited by bedevere-bot

Loading

Uh oh!

vstinner left a comment

Uh oh!

vstinner Jun 20, 2019

Uh oh!

vstinner Jun 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

methane commented Jun 20, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 20, 2019

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 20, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

methane commented Jun 20, 2019 •

edited by bedevere-bot

Loading