Skip to content

bpo-31324: Optimize support._match_test().#4420

Closed
serhiy-storchaka wants to merge 2 commits intopython:masterfrom
serhiy-storchaka:faster-match_test
Closed

bpo-31324: Optimize support._match_test().#4420
serhiy-storchaka wants to merge 2 commits intopython:masterfrom
serhiy-storchaka:faster-match_test

Conversation

@serhiy-storchaka
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka commented Nov 16, 2017

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart the wrong re.I flag, LGTM.

I tested manually your PR: it works as expected.
https://bugs.python.org/issue31324#msg306362

Comment thread Lib/test/support/__init__.py Outdated
return True
if match_tests != _match_tests:
match_tests_re = '|'.join(map(fnmatch.translate, match_tests))
_match_test1 = re.compile(match_tests_re, re.I).match
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum no, re.I must no be used: the match is expected to be case sensitive, not case insensitve.

return False
if _match_test1(test_id):
return True
return any(map(_match_test1, test_id.split(".")))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all match_tests patterns contain a dot ".", we could make this code even simpler:

return match_test(test_id)

But I'm unable to see a major performance difference when running:

./python -m test.bisect --fail-env-changed -o bisect test_asyncio -v

I'm checking how much time it takes before running the first test: so the time to load and filter tests.

@vstinner
Copy link
Copy Markdown
Member

support._match_test() has at least a complexity of O(n) where n is the number of patterns in support.match_tests, just to check if support.match_tests was modified :-(

I proposed PR #4421 which replaces support.match_tests global variable with a new support.set_match_tests() function to cache the "matcher" object and avoid the need of checking if patterns changed. Moreover, I implemented a very simple matcher using set() for the test.bisect use case: all patterns are "full identifiers".

Using "./python -m test.bisect --fail-env-changed -o bisect test_asyncio -v" as a "benchmark": I see the first test running faster with my PR than using this PR.

I guess that the problem bottleneck of this PR is the minimum O(n) cost because by mutable support.mtach_tests, whereas my benchmark uses 768 patterns.

@vstinner
Copy link
Copy Markdown
Member

I merged PR #4421 which is based on this PR but goes further in term of optimization and adds news tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge skip news tests Tests in the Lib/test dir type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants