Skip to content

bpo-28564: Use os.scandir() in shutil.rmtree() on Unix.#4085

Merged
serhiy-storchaka merged 7 commits intopython:masterfrom
serhiy-storchaka:shutil-rmtree-scandir
Nov 4, 2017
Merged

bpo-28564: Use os.scandir() in shutil.rmtree() on Unix.#4085
serhiy-storchaka merged 7 commits intopython:masterfrom
serhiy-storchaka:shutil-rmtree-scandir

Conversation

@serhiy-storchaka
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka commented Oct 23, 2017

This speeds up it to 20%.

https://bugs.python.org/issue28564

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Oct 24, 2017

What about the unsafe variant? Can't it benefit from a similar optimization?

@serhiy-storchaka
Copy link
Copy Markdown
Member Author

Good question! Using os.scandir() in _rmtree_unsafe() speeds up it to 25% on Windows and >40% on Linux (if disable _rmtree_safe_fd()).

Comment thread Lib/shutil.py
try:
orig_st = os.stat(name, dir_fd=topfd, follow_symlinks=False)
mode = orig_st.st_mode
is_dir = entry.is_dir(follow_symlinks=False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you are doing this is_dir call before re-stat()ing below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If is_dir is false, we can avoid calling stat(). On Posix scandir() usually fills the is_dir bit, but stat() needs a syscall.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then why is stat() needed at all? Isn't is_dir() sufficient?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I hadn't noticed the use of orig_st below. My bad.

Comment thread Lib/shutil.py
# os.scandir above.
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, fullname, sys.exc_info())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add a return just after this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But rather continue.

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice improvement!

@serhiy-storchaka serhiy-storchaka merged commit d4d79bc into python:master Nov 4, 2017
@serhiy-storchaka serhiy-storchaka deleted the shutil-rmtree-scandir branch November 4, 2017 12:16
embray pushed a commit to embray/cpython that referenced this pull request Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants