From 8aaae83eab31b6e8fa2b4d8d7e39de29596d69e1 Mon Sep 17 00:00:00 2001 From: "Bernhard M. Wiedemann" Date: Tue, 23 May 2017 15:33:54 +0200 Subject: [PATCH] bpo-30461: glob: sort the resulting list because POSIX readdir does not guarantee any order glob often gave unexpectedly random results. This change makes it behave similar to POSIX glob(3). Some background: for openSUSE Linux we build packages in the Open Build Service (OBS) which tracks dependencies, so when e.g. a new glibc is submitted, all packages depending on glibc are rebuilt and if those depending binaries changed, the new version is pushed to the mirrors. Many python modules build their .so files from a glob.glob("*.cpp") The old glob behaviour would often lead to the linker randomly ordering functions in resulting object files, thus we were not able to auto-detect that the package did not actually change which wastes bandwidth of distribution mirrors and users. See also https://reproducible-builds.org/ on that topic. This change should not break existing software because there were no guarantees on ordering of glob results. Measurements with 'perf' show the new code to be 4ms / 1.07x slower (for /usr/*/* with 9854 files) The alternative would be to patch each package individually but that would be quite some effort and not be as nice to use as can be seen in https://www.riverbankcomputing.com/pipermail/pyqt/2017-May/039214.html --- Doc/library/glob.rst | 4 ++-- Lib/glob.py | 2 +- Lib/test/test_glob.py | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Doc/library/glob.rst b/Doc/library/glob.rst index a8a5a500cbcfbf..2a18ac617b8d5c 100644 --- a/Doc/library/glob.rst +++ b/Doc/library/glob.rst @@ -11,8 +11,8 @@ -------------- The :mod:`glob` module finds all the pathnames matching a specified pattern -according to the rules used by the Unix shell, although results are returned in -arbitrary order. No tilde expansion is done, but ``*``, ``?``, and character +according to the rules used by the Unix shell, and results are returned in +sorted order. No tilde expansion is done, but ``*``, ``?``, and character ranges expressed with ``[]`` will be correctly matched. This is done by using the :func:`os.scandir` and :func:`fnmatch.fnmatch` functions in concert, and not by actually invoking a subshell. Note that unlike :func:`fnmatch.fnmatch`, diff --git a/Lib/glob.py b/Lib/glob.py index 002cd920190da7..c43b455ccbba44 100644 --- a/Lib/glob.py +++ b/Lib/glob.py @@ -17,7 +17,7 @@ def glob(pathname, *, recursive=False): If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories. """ - return list(iglob(pathname, recursive=recursive)) + return sorted(iglob(pathname, recursive=recursive)) def iglob(pathname, *, recursive=False): """Return an iterator which yields the paths matching a pathname pattern. diff --git a/Lib/test/test_glob.py b/Lib/test/test_glob.py index dce64f9fcb1a28..f49318ba34daf4 100644 --- a/Lib/test/test_glob.py +++ b/Lib/test/test_glob.py @@ -49,10 +49,10 @@ def glob(self, *parts, **kwargs): pattern = os.path.join(*parts) p = os.path.join(self.tempdir, pattern) res = glob.glob(p, **kwargs) - self.assertEqual(list(glob.iglob(p, **kwargs)), res) + self.assertEqual(sorted(glob.iglob(p, **kwargs)), res) bres = [os.fsencode(x) for x in res] self.assertEqual(glob.glob(os.fsencode(p), **kwargs), bres) - self.assertEqual(list(glob.iglob(os.fsencode(p), **kwargs)), bres) + self.assertEqual(sorted(glob.iglob(os.fsencode(p), **kwargs)), bres) return res def assertSequencesEqual_noorder(self, l1, l2):