Skip to content

bpo-37348: optimize PyUnicode_FromString#14273

Closed
methane wants to merge 2 commits intopython:masterfrom
methane:ascii-fromstring
Closed

bpo-37348: optimize PyUnicode_FromString#14273
methane wants to merge 2 commits intopython:masterfrom
methane:ascii-fromstring

Conversation

@methane
Copy link
Copy Markdown
Member

@methane methane commented Jun 20, 2019

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this change is needed (why it makes the code faster): unicode_decode_utf8() already starts by decoding from ASCII. Is it a performance issue in ascii_decode()? Does _PyUnicodeWriter adds a small overhead which is significant here?

cc @serhiy-storchaka who knows well such hardcores micro-optimizations ;-)

Comment thread Objects/unicodeobject.c

while (u[len] != '\0') {
if (u[len] > 127) {
is_ascii = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you reuse fast _Py_bytes_isascii() somehow here? It works on unsigned long words (64 bits) rather than working on bytes (8 bits): it should be 8x faster.

Comment thread Objects/unicodeobject.c
if (is_ascii) {
return _PyUnicode_FromASCII(u, (Py_ssize_t)len);
}
return PyUnicode_DecodeUTF8Stateful(u, (Py_ssize_t)len, NULL, NULL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not modifying PyUnicode_DecodeUTF8Stateful() to detect if the input string is ASCII, or your heuristic is faster? Is it because it make non-ASCII string decoding slower?

@methane methane closed this Jun 21, 2019
@methane methane deleted the ascii-fromstring branch June 21, 2019 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants