bpo-31324: Optimize support._match_test()#4421
bpo-31324: Optimize support._match_test()#4421vstinner merged 5 commits intopython:masterfrom vstinner:set_match_tests
Conversation
|
Only Lib/test/libregrtest/ was using support.match_tests, and my PR updates Lib/test/libregrtest/. I don't think that support.mtach_tests is used outside CPython code base. test.support is mostly used internally by the CPython test suite. |
There was a problem hiding this comment.
Fixed ("global" removed)
There was a problem hiding this comment.
I think the class is not needed. Just use a function which returns a function.
There was a problem hiding this comment.
Just return set(patterns).__contains__.
There was a problem hiding this comment.
What about '?' and '['?
There was a problem hiding this comment.
Could be all(map(self._is_full_match, patterns)).
* Rename support._match_test() to support.match_test(): make it public * Remove support.match_tests global variable. It is replaced with a new support.set_match_tests() function, so match_test() doesn't have to check each time if patterns were modified. * Rewrite match_test(): use different code paths depending on the kind of patterns for best performances. Co-Authored-By: Serhiy Storchaka <[email protected]>
|
@serhiy-storchaka: I addressed your comments, I added unit tests, I chose to make support.match_test() public since it's used by regrtest, and I rebased my change. |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Added few suggestions.
And please add tests with a pattern containing '.' and metacharacters and with multiple patterns.
| # | ||
| # Reject patterns which contain fnmatch patterns: '*', '?', '[...]' | ||
| # or '[!...]'. For example, reject 'test_access*'. | ||
| return ('.' in pattern) and all(char not in '?*[]' for char in pattern) |
There was a problem hiding this comment.
I'm wondering whether using a regular expression would be faster.
return ('.' in pattern) and not re.search(r'[?*\[\]]', pattern)There was a problem hiding this comment.
I think that (char not in '?*[]') is more readable, and this function doesn't seem to be a performance bottleneck, there is no need to optimize it. Moreover, I just added a cache to not recompile patterns if set_match_tests() is called twice with the same patterns.
It seems like the compilation to a giant regex is slow, but I don't think that we can optimize this much, and again, I don't think that it's worth it. My main usage of many patterns is when I use test.bisect, but test.bisect only uses full test identifiers and so should use set().contains.
|
|
||
| if not patterns: | ||
| _match_test_func = None | ||
| elif all(_is_full_match_test(pattern) for pattern in patterns): |
There was a problem hiding this comment.
Could be all(map(_is_full_match_test, patterns)).
There was a problem hiding this comment.
Right, but I prefer list comprehensions. IMHO it's more readable.
There was a problem hiding this comment.
map() is already used twice few lines below.
There was a problem hiding this comment.
Oh right, that's the copied I copied from your PR :-D You won, I updated my PR to address your latest review.
| test_access = Test('test.test_os.FileTests.test_access') | ||
| test_chdir = Test('test.test_os.Win32ErrorTests.test_chdir') | ||
|
|
||
| old_match = support._match_test_func |
There was a problem hiding this comment.
Could use
with support.swap_attr(support, '_match_test_func', None):Use also support.swap_attr().
Done |
| # Reject patterns which contain fnmatch patterns: '*', '?', '[...]' | ||
| # or '[!...]'. For example, reject 'test_access*'. | ||
| return ('.' in pattern) and all(char not in '?*[]' for char in pattern) | ||
| return ('.' in pattern) and (not re.search(r'[?*\[\]]', pattern)) |
There was a problem hiding this comment.
Are parenthesis around not ... add readability?
|
|
||
| func = match_test_regex | ||
|
|
||
| _match_test_patterns = patterns |
There was a problem hiding this comment.
Shouldn't we make a copy? Or just ignore the case of modifying patterns after calling set_match_tests()? This is a private API at end.
|
Thank you for the reviews @serhiy-storchaka! |
* bpo-31324: Optimize support._match_test() (#4421) * Rename support._match_test() to support.match_test(): make it public * Remove support.match_tests global variable. It is replaced with a new support.set_match_tests() function, so match_test() doesn't have to check each time if patterns were modified. * Rewrite match_test(): use different code paths depending on the kind of patterns for best performances. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 803ddd8) * bpo-31324: Fix test.support.set_match_tests(None) (#4505) (cherry picked from commit bb11c3c)
* bpo-31324: Optimize support._match_test() (#4421) * Rename support._match_test() to support.match_test(): make it public * Remove support.match_tests global variable. It is replaced with a new support.set_match_tests() function, so match_test() doesn't have to check each time if patterns were modified. * Rewrite match_test(): use different code paths depending on the kind of patterns for best performances. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 803ddd8) * bpo-31324: Fix test.support.set_match_tests(None) (#4505) (cherry picked from commit bb11c3c) (cherry picked from commit 70b2f87)
new support.set_match_tests() function, so _match_test() doesn't
have to check each time if patterns were modified.
different code paths depending on the kind of patterns.
Co-Authored-By: Serhiy Storchaka [email protected]
https://bugs.python.org/issue31324