Skip to content

bpo-32677: Add .isascii() to str, bytes and bytearray#5342

Merged
methane merged 12 commits intopython:masterfrom
methane:str-isascii
Jan 27, 2018
Merged

bpo-32677: Add .isascii() to str, bytes and bytearray#5342
methane merged 12 commits intopython:masterfrom
methane:str-isascii

Conversation

@methane
Copy link
Copy Markdown
Member

@methane methane commented Jan 26, 2018

Voting on Python-ideas ML for now.

https://bugs.python.org/issue32677

Comment thread Objects/unicodeobject.c Outdated
unicode_isascii_impl(PyObject *self)
/*[clinic end generated code: output=c5910d64b5a8003f input=73b30d38f16965cd]*/
{
if (PyUnicode_READY(self) == -1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7: please add { ... }

Comment thread Objects/unicodeobject.c Outdated
{
if (PyUnicode_READY(self) == -1)
return NULL;
if (PyUnicode_IS_ASCII(self)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to PyBool_FromLong(PyUnicode_IS_ASCII(self));

Comment thread Objects/unicodeobject.c Outdated
/*[clinic input]
str.isascii as unicode_isascii

Return True if the string is an ascii string, False otherwise.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: You've used all uppercase version (ASCII) in the documentation and NEWS file. I'd suggest using the same style (whether ASCII or ascii) everywhere.

@methane methane requested a review from rhettinger as a code owner January 26, 2018 12:05
Comment thread Doc/library/stdtypes.rst Outdated
.. method:: str.isascii()

Return true if all characters in the string are ASCII, false otherwise.
ASCII characters are characters which :func:`ord` returns less than 128.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: We way reuse curses.ascii.isascii() documentation to describe what we meant by an ASCII character: https://docs.python.org/3/library/curses.ascii.html#curses.ascii.isascii

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest: "ASCII characters have code points in the range U+0000-U+007F."

Comment thread Lib/test/test_unicode.py Outdated
self.assertTrue("".isascii())
self.assertTrue("\0".isascii())
self.assertTrue("\x7f".isascii())
self.assertFalse("\x80".isascii())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to test larger code points as well:

self.assertFalse("\xe9".isascii())
self.assertFalse("\u20ac".isascii())
self.assertFalse("\U0010ffff".isascii())

My 3 favorite code points to test Latin1, BMP and non-BMP :-)

(only these tests, i don't think that it's useful to check for variant with spaces before/after)

Comment thread Doc/library/stdtypes.rst
.. method:: str.isascii()

Return true if all characters in the string are ASCII, false otherwise.
ASCII characters have code points in the range U+0000-U+007F.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can simplify the description as:

Return true if all characters have code points in the range U+0000-U+007F or if the string is empty, false otherwise.

@methane methane changed the title [DO NOT MERGE] bpo-32677: Add str.isascii() bpo-32677: Add str.isascii() Jan 26, 2018
@methane methane changed the title bpo-32677: Add str.isascii() bpo-32677: Add .isascii() to str, bytes and bytearray Jan 26, 2018
Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just have a minor comment on str.isascii() docstring.

Comment thread Objects/unicodeobject.c
/*[clinic input]
str.isascii as unicode_isascii

Return True if all characters in the string are ASCII, False otherwise.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, maybe copy from the doc: "Return true if the string is empty or all characters in the string are ASCII," rather than "Empty string is ASCII too." below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Return true if the string is empty or all characters in the string are ASCII, False otherwise." overs 80 columns.
And clinic show error when I wrap the line.

All other docstrings in unicodeobject has short (<80) summaries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that's a nasty issue. Ignore my comment and leave the docstring as it is ;-)

Comment thread Objects/bytes_methods.c
if (*p >= 128) {
Py_RETURN_FALSE;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to optimize this function, I suggest you to look at ascii_decode() of Objects/unicodeobject.c which is heavily optimized to scan ASCII characters in a uint8_t* string. It works on "unsigned long" words rather than working on bytes.

But it should be done in a second PR. Right now, I would prefer to push this PR before 3.7b1 (monday).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

@methane methane merged commit a49ac99 into python:master Jan 27, 2018
@methane methane deleted the str-isascii branch January 27, 2018 05:06
AlexWaygood added a commit to AlexWaygood/typeshed that referenced this pull request Nov 26, 2021
These were all discovered by running the `stubtest_stdlib` test without the `--ignore-missing-stub` option.

`ChainMap.copy()` and `ChainMap.fromkeys()` appear to have been there since at least Python 3.6 (cpython source code for the class here: https://github.com/python/cpython/blob/2c56c97f015a7ea81719615ddcf3c745fba5b4f3/Lib/collections/__init__.py#L853).

`Counter.total()` was added in 3.10: https://docs.python.org/3/library/collections.html#collections.Counter.total.

`UserString.isascii()` appears to have been added in Python 3.7; it doesn't appear in the ChangeLog, but you can see it was added in this commit here: python/cpython#5342.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants