bpo-18236: Adjust str.isspace to use Unicode's White_Space property.#16254
bpo-18236: Adjust str.isspace to use Unicode's White_Space property.#16254gnprice wants to merge 1 commit intopython:mainfrom
Conversation
When Unicode support was first added to Python, there was no Unicode property identifying whitespace, so we approximated it by putting together a couple of other properties. Now there is a White_Space property, so let's use it. Happily, the difference from our original approximation is only in the four rare control characters 001C..001F. As a bonus, `isspace` now joins all similar methods in giving exactly matching results for ASCII characters represented as `str` or as `bytes`. Add a test for that nice property.
Numerlor
left a comment
There was a problem hiding this comment.
Ran into an issue with this when I noticed int's stripping is different to str's, so it's nice to see that there's already a PR for a fix.
The introduced changes seem good to my eye that's not entirely familiar with the codebase, apart from the outdated version notices.
I've also noticed that the added docstrings to test_unicode are using single quote marks while others are using double quotes so changing that to be consistent would be nice
| ((bidirectional in ('WS', 'B', 'S') | ||
| or category == 'Zs') | ||
| and codepoint not in range(0x1c, 0x20))) |
There was a problem hiding this comment.
Is this guaranteed to hold in future unicode versions? i.e. should it be tested on the string method if it's only testing the unicode database parsing and generation?
|
This PR is stale because it has been open for 30 days with no activity. |
When Unicode support was first added to Python, there was no Unicode
property identifying whitespace, so we approximated it by putting
together a couple of other properties.
Now there is a White_Space property, so let's use it.
Happily, the difference from our original approximation is only in the
four rare control characters 001C..001F.
As a bonus,
isspacenow joins all similar methods in giving exactlymatching results for ASCII characters represented as
stror asbytes. Add a test for that nice property.https://bugs.python.org/issue18236