bpo-32677: Add .isascii() to str, bytes and bytearray#5342
bpo-32677: Add .isascii() to str, bytes and bytearray#5342methane merged 12 commits intopython:masterfrom
Conversation
| unicode_isascii_impl(PyObject *self) | ||
| /*[clinic end generated code: output=c5910d64b5a8003f input=73b30d38f16965cd]*/ | ||
| { | ||
| if (PyUnicode_READY(self) == -1) |
| { | ||
| if (PyUnicode_READY(self) == -1) | ||
| return NULL; | ||
| if (PyUnicode_IS_ASCII(self)) { |
There was a problem hiding this comment.
Can be simplified to PyBool_FromLong(PyUnicode_IS_ASCII(self));
| /*[clinic input] | ||
| str.isascii as unicode_isascii | ||
|
|
||
| Return True if the string is an ascii string, False otherwise. |
There was a problem hiding this comment.
Style nit: You've used all uppercase version (ASCII) in the documentation and NEWS file. I'd suggest using the same style (whether ASCII or ascii) everywhere.
| .. method:: str.isascii() | ||
|
|
||
| Return true if all characters in the string are ASCII, false otherwise. | ||
| ASCII characters are characters which :func:`ord` returns less than 128. |
There was a problem hiding this comment.
Just a suggestion: We way reuse curses.ascii.isascii() documentation to describe what we meant by an ASCII character: https://docs.python.org/3/library/curses.ascii.html#curses.ascii.isascii
There was a problem hiding this comment.
I suggest: "ASCII characters have code points in the range U+0000-U+007F."
| self.assertTrue("".isascii()) | ||
| self.assertTrue("\0".isascii()) | ||
| self.assertTrue("\x7f".isascii()) | ||
| self.assertFalse("\x80".isascii()) |
There was a problem hiding this comment.
I suggest to test larger code points as well:
self.assertFalse("\xe9".isascii())
self.assertFalse("\u20ac".isascii())
self.assertFalse("\U0010ffff".isascii())
My 3 favorite code points to test Latin1, BMP and non-BMP :-)
(only these tests, i don't think that it's useful to check for variant with spaces before/after)
| .. method:: str.isascii() | ||
|
|
||
| Return true if all characters in the string are ASCII, false otherwise. | ||
| ASCII characters have code points in the range U+0000-U+007F. |
There was a problem hiding this comment.
Maybe you can simplify the description as:
Return true if all characters have code points in the range U+0000-U+007F or if the string is empty, false otherwise.
vstinner
left a comment
There was a problem hiding this comment.
LGTM, I just have a minor comment on str.isascii() docstring.
| /*[clinic input] | ||
| str.isascii as unicode_isascii | ||
|
|
||
| Return True if all characters in the string are ASCII, False otherwise. |
There was a problem hiding this comment.
nitpick, maybe copy from the doc: "Return true if the string is empty or all characters in the string are ASCII," rather than "Empty string is ASCII too." below.
There was a problem hiding this comment.
"Return true if the string is empty or all characters in the string are ASCII, False otherwise." overs 80 columns.
And clinic show error when I wrap the line.
All other docstrings in unicodeobject has short (<80) summaries.
There was a problem hiding this comment.
Oh wow, that's a nasty issue. Ignore my comment and leave the docstring as it is ;-)
| if (*p >= 128) { | ||
| Py_RETURN_FALSE; | ||
| } | ||
| } |
There was a problem hiding this comment.
If you want to optimize this function, I suggest you to look at ascii_decode() of Objects/unicodeobject.c which is heavily optimized to scan ASCII characters in a uint8_t* string. It works on "unsigned long" words rather than working on bytes.
But it should be done in a second PR. Right now, I would prefer to push this PR before 3.7b1 (monday).
These were all discovered by running the `stubtest_stdlib` test without the `--ignore-missing-stub` option. `ChainMap.copy()` and `ChainMap.fromkeys()` appear to have been there since at least Python 3.6 (cpython source code for the class here: https://github.com/python/cpython/blob/2c56c97f015a7ea81719615ddcf3c745fba5b4f3/Lib/collections/__init__.py#L853). `Counter.total()` was added in 3.10: https://docs.python.org/3/library/collections.html#collections.Counter.total. `UserString.isascii()` appears to have been added in Python 3.7; it doesn't appear in the ChangeLog, but you can see it was added in this commit here: python/cpython#5342.
Voting on Python-ideas ML for now.
https://bugs.python.org/issue32677