Skip to content

bpo-31324: Optimize support._match_test()#4421

Merged
vstinner merged 5 commits intopython:masterfrom
vstinner:set_match_tests
Nov 21, 2017
Merged

bpo-31324: Optimize support._match_test()#4421
vstinner merged 5 commits intopython:masterfrom
vstinner:set_match_tests

Conversation

@vstinner
Copy link
Copy Markdown
Member

@vstinner vstinner commented Nov 16, 2017

  • Remove support.match_tests global variable. It is replaced with a
    new support.set_match_tests() function, so _match_test() doesn't
    have to check each time if patterns were modified.
  • Rewrite _match_test() with a new _MatchTests object which uses
    different code paths depending on the kind of patterns.

Co-Authored-By: Serhiy Storchaka [email protected]

https://bugs.python.org/issue31324

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented Nov 16, 2017

Lib/test/support/__init__.py:

-match_tests = None

Only Lib/test/libregrtest/ was using support.match_tests, and my PR updates Lib/test/libregrtest/. I don't think that support.mtach_tests is used outside CPython code base. test.support is mostly used internally by the CPython test suite.

Comment thread Lib/test/support/__init__.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed ("global" removed)

Comment thread Lib/test/support/__init__.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the class is not needed. Just use a function which returns a function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

Comment thread Lib/test/support/__init__.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return set(patterns).__contains__.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread Lib/test/support/__init__.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about '?' and '['?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread Lib/test/support/__init__.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be all(map(self._is_full_match, patterns)).

* Rename support._match_test() to support.match_test(): make it
  public
* Remove support.match_tests global variable. It is replaced with a
  new support.set_match_tests() function, so match_test() doesn't
  have to check each time if patterns were modified.
* Rewrite match_test(): use different code paths depending on the
  kind of patterns for best performances.

Co-Authored-By: Serhiy Storchaka <[email protected]>
@vstinner
Copy link
Copy Markdown
Member Author

@serhiy-storchaka: I addressed your comments, I added unit tests, I chose to make support.match_test() public since it's used by regrtest, and I rebased my change.

Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few suggestions.

And please add tests with a pattern containing '.' and metacharacters and with multiple patterns.

Comment thread Lib/test/support/__init__.py Outdated
#
# Reject patterns which contain fnmatch patterns: '*', '?', '[...]'
# or '[!...]'. For example, reject 'test_access*'.
return ('.' in pattern) and all(char not in '?*[]' for char in pattern)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether using a regular expression would be faster.

return ('.' in pattern) and not re.search(r'[?*\[\]]', pattern)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that (char not in '?*[]') is more readable, and this function doesn't seem to be a performance bottleneck, there is no need to optimize it. Moreover, I just added a cache to not recompile patterns if set_match_tests() is called twice with the same patterns.

It seems like the compilation to a giant regex is slow, but I don't think that we can optimize this much, and again, I don't think that it's worth it. My main usage of many patterns is when I use test.bisect, but test.bisect only uses full test identifiers and so should use set().contains.

Comment thread Lib/test/support/__init__.py Outdated

if not patterns:
_match_test_func = None
elif all(_is_full_match_test(pattern) for pattern in patterns):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be all(map(_is_full_match_test, patterns)).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I prefer list comprehensions. IMHO it's more readable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map() is already used twice few lines below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, that's the copied I copied from your PR :-D You won, I updated my PR to address your latest review.

Comment thread Lib/test/test_support.py Outdated
test_access = Test('test.test_os.FileTests.test_access')
test_chdir = Test('test.test_os.Win32ErrorTests.test_chdir')

old_match = support._match_test_func
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use

with support.swap_attr(support, '_match_test_func', None):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vstinner
Copy link
Copy Markdown
Member Author

And please add tests with a pattern containing '.' and metacharacters and with multiple patterns.

Done

# Reject patterns which contain fnmatch patterns: '*', '?', '[...]'
# or '[!...]'. For example, reject 'test_access*'.
return ('.' in pattern) and all(char not in '?*[]' for char in pattern)
return ('.' in pattern) and (not re.search(r'[?*\[\]]', pattern))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are parenthesis around not ... add readability?

Comment thread Lib/test/support/__init__.py Outdated

func = match_test_regex

_match_test_patterns = patterns
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make a copy? Or just ignore the case of modifying patterns after calling set_match_tests()? This is a private API at end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed.

@vstinner vstinner merged commit 803ddd8 into python:master Nov 21, 2017
@vstinner vstinner deleted the set_match_tests branch November 21, 2017 23:34
@vstinner
Copy link
Copy Markdown
Member Author

Thank you for the reviews @serhiy-storchaka!

vstinner added a commit that referenced this pull request Nov 23, 2017
* bpo-31324: Optimize support._match_test() (#4421)

* Rename support._match_test() to support.match_test(): make it
  public
* Remove support.match_tests global variable. It is replaced with a
  new support.set_match_tests() function, so match_test() doesn't
  have to check each time if patterns were modified.
* Rewrite match_test(): use different code paths depending on the
  kind of patterns for best performances.

Co-Authored-By: Serhiy Storchaka <[email protected]>
(cherry picked from commit 803ddd8)

* bpo-31324: Fix test.support.set_match_tests(None) (#4505)

(cherry picked from commit bb11c3c)
vstinner added a commit that referenced this pull request Nov 23, 2017
* bpo-31324: Optimize support._match_test() (#4421)

* Rename support._match_test() to support.match_test(): make it
  public
* Remove support.match_tests global variable. It is replaced with a
  new support.set_match_tests() function, so match_test() doesn't
  have to check each time if patterns were modified.
* Rewrite match_test(): use different code paths depending on the
  kind of patterns for best performances.

Co-Authored-By: Serhiy Storchaka <[email protected]>
(cherry picked from commit 803ddd8)

* bpo-31324: Fix test.support.set_match_tests(None) (#4505)

(cherry picked from commit bb11c3c)
(cherry picked from commit 70b2f87)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants