Skip to content

bpo-14094: Use _getfinalpathname to implement realpath#11248

Closed
vladima wants to merge 23 commits intopython:masterfrom
vladima:realpath
Closed

bpo-14094: Use _getfinalpathname to implement realpath#11248
vladima wants to merge 23 commits intopython:masterfrom
vladima:realpath

Conversation

@vladima
Copy link
Copy Markdown
Contributor

@vladima vladima commented Dec 19, 2018

Per discussion realpath should be using GetFinalPathNameByHandle and it is already exposed via _getfinalpathname.

https://bugs.python.org/issue14094

Copy link
Copy Markdown
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get to bpo right now to check the discussion, but the change looks fine.

Comment thread Lib/ntpath.py Outdated
@vladima
Copy link
Copy Markdown
Contributor Author

vladima commented Dec 20, 2018

A missing bit - handling of UNC paths with \\?\ prefix - such paths will be prefixed with \\?\UNC\

Comment thread Misc/NEWS.d/next/Library/2018-12-19-15-26-35.bpo-14094.8Hotek.rst Outdated
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
@vladima
Copy link
Copy Markdown
Contributor Author

vladima commented Dec 21, 2018

I can update this PR with a suggested approach of people don't mind.

Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
resolved = join(*reversed(resolved_parts))
# try to convert extended path to normal if
# initial path did not use \\?\ prefix and result uses it
if not is_extended_path and resolved.startswith(extended_length_prefix):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch to also check for the extended-path prefix here since all of the _getfinalpathname calls may have failed.

Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
@vladima
Copy link
Copy Markdown
Contributor Author

vladima commented Jan 4, 2019

is there anything else that should be done for this PR?

@vladima
Copy link
Copy Markdown
Contributor Author

vladima commented Jan 9, 2019

@vstinner, @zooba, @eryksun - can you please tell if there is anything missing in this PR?

Comment thread Lib/ntpath.py Outdated
Comment thread Misc/NEWS.d/next/Library/2018-12-19-15-26-35.bpo-14094.8Hotek.rst Outdated
Comment thread Lib/ntpath.py Outdated
def realpath(filename):
filename = os.fspath(filename)
is_str = isinstance(filename, str)
filename = os.fsdecode(filename)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid this call if it's not required:

is_str = isinstance(filename, str)
if not is_str:
    filename = os.fsdecode(filename)

@zooba, is this okay?

Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
is_extended_path = filename.startswith(extended_path_prefix)
return filename if is_extended_path else abspath(filename)

def _getfinalpathname(path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this fallback was ever defined for non-Windows platforms. Maybe it was a mistake or left over from code that's long gone. I think we don't need this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping it is easy however at glance ntpath module is intended to be working even on non-Windows platform - there are a few places that try to gracefully handle missing native implementations:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_getfullpathname doesn't get defined. _getvolumepathname is set to None to avoid duplicating the definition of ismount for non-Windows platforms, but it should really be a boolean _has_getvolumepathname. (If Python had inline functions I'd write the DRY common part as an inline function. Supporting platforms dynamically at runtime is suboptimal.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've misunderstood you - assumed you are talking about both fallbacks defined in case if importing _getfinalpathname fails.

Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
Comment thread Lib/ntpath.py Outdated
@vladima
Copy link
Copy Markdown
Contributor Author

vladima commented Jan 14, 2019

I think all comments are addressed, is there anything else?

@eryksun
Copy link
Copy Markdown
Contributor

eryksun commented Jan 14, 2019

I think all comments are addressed, is there anything else?

Feedback from core devs would be appreciated (@vstinner, @serhiy-storchaka, @zooba, @ZackerySpytz, @pfmoore, @tjguk, @zware).

The short-name test needs to be hardened to get the short name of the directory via FindFirstFileW. Here's a function to get the short name of a file (alternatively, for the entire path, call GetShortPathNameW):

import ctypes
from ctypes import wintypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

INVALID_HANDLE_VALUE = wintypes.HANDLE(-1).value
kernel32.FindFirstFileW.restype = wintypes.HANDLE
kernel32.FindClose.argtypes = (wintypes.HANDLE,)

def get_short_name(path):
    info = wintypes.WIN32_FIND_DATAW()
    h = kernel32.FindFirstFileW(path, ctypes.byref(info))
    if h == INVALID_HANDLE_VALUE:
        raise ctypes.WinError(ctypes.get_last_error())
    kernel32.FindClose(h)
    return info.cAlternateFileName

Skip the test if get_short_name returns an empty string. In this case, the file system of the temp directory may not support short names, or automatic creation of short names may be disabled.

I think more cases need to be tested, including adapting tests from posixpath, if any are relevant.

  • An input extended path should always be returned as an extended path.
  • Normal paths replace forward slash with backslash, but extended paths do not. Join the junction path with non-existent "spam/eggs". Resolve it as a normal path, and verify that the junction target is resolved and the unresolved part is normalized to "spam\eggs". Do the same but with an extended path, and verify that the unresolved part is left as "spam/eggs".
  • Normal paths resolve "." and ".." components in user mode, before making a system call. Join the junction path with ".\spam\..". Resolve it using a normal path, and verify that it's the junction target. Resolve it using an extended path, and verify that the result is an extended path for the junction target that's joined with the unresolved part ".\spam\..". (Windows diverges from Unix on this point, and it shows up especially when handling symlinks and junction mountpoints. It's also why we can lead with an abspath call in realpath, which minimizes the risk of a working-directory race-condition for a relative path and immediately resolves reserved DOS device names. File systems may reserve "." and ".." as names, but these names have no other significance in the kernel. Using an extended path, FAT32 will even let us create directories or files named "." and "..". They will then appear twice in the directory listing, except only once in the root directory.)
  • Normal paths strip trailing spaces and dots from the final component. Use an extended path to create a file named "spam. . .". Resolve it using a normal path, and verify that it returns non-existing "spam" as the name. Create a symlink to the extended-path target. The symlink privilege may be required; skip the rest if this fails. Resolve the link using a normal path, and verify that it returns an extended path for "spam. . ." . (We can't target a directory named "spam. . ." using a junction, since _winapi.CreateJunction mistakenly calls GetFullPathNameW on an extended-path target. If this gets fixed, we can use a junction instead.)
  • Normal paths reserve DOS device names in the final path component of logical-drive paths. Use an extended path to create a file named "nul: .txt"; make sure this is a logical-drive path. Resolve it using a normal path, and verify that it returns the device path "\\.\nul". Create a symlink to the extended-path target. Skip the rest if this fails. Resolve the link using a normal path, and verify that it returns an extended path for "nul: .txt".

Copy link
Copy Markdown
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see more tests. We're covering a lot of edge-cases here now, but not checking them all.

Comment thread Lib/ntpath.py
is_str = isinstance(filename, str)
if not is_str:
filename = os.fsdecode(filename)
extended_path_prefix = '\\\\?\\'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer module-level _PRIVATE_CONSTANTS

Comment thread Lib/ntpath.py
genericpath._check_arg_types('commonpath', *paths)
raise

MAX_PATH = 260
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_INTERNAL_CONSTANT or else document it (as a legacy value that can largely be ignored)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My later suggestion may remove the need for this constant entirely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to document as a public constant, but maybe change it to MAX_PATH_DOS or MAX_PATH_LEGACY? Python still has an inter-operation concern even if our own process supports long paths (in addition to our manifest setting, this requires enabling at the OS level , which last I checked isn't the default setting). Along the same lines, CreateProcess doesn't allow the current directory (explicit or inherited) to be a long path.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've never had it in the past, and it's becoming less relevant. I'd rather not add it now, and continue the work we do to handle it as transparently as possible.

Comment thread Lib/ntpath.py
# return path as-is
return path

if (len(normal_path) < MAX_PATH and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not check the length here and just handle the OSError that _getfullpathname raises if the path is too long (which may be 32k instead of 260 on more recent systems)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be convenient, but GetFullPathNameW only has the NT limit for a null-terminated string, UNICODE_STRING_MAX_CHARS - 1 (32766). Someone made a couple of mistakes in the docs. Windows 10 long-path support has no bearing on GetFullPathNameW, and we've never had to use the "\\?\" prefix to support up to 32,766 characters with this function. The only change I recall is that prior to Windows 7 the behavior wasn't well-defined if we passed it a path that exceeded 32,766 characters. I think it would type cast the length in bytes to USHORT. Since Windows 7, it calls the internal function RtlGetFullPathName_UEx, which returns an NTSTATUS code. If the input path length exceeds 32,766, RtlGetFullPathName_UEx fails with STATUS_NAME_TOO_LONG (i.e. ERROR_FILENAME_EXCED_RANGE). If the result would exceed 32,766 characters, it fails with STATUS_OBJECT_NAME_INVALID (i.e. ERROR_INVALID_NAME).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be convenient, but …

I don't understand why you say "but"? It sounds like (and I confirmed from source) all our supported platforms can handle normal and excessively long paths here just fine, so there's no need for the check, correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're probably talking past each other here. IMO, we shouldn't remove the "\\?\" prefix for long paths prior to Windows 10, and in Windows 10 we shouldn't remove it if long-path support isn't enabled by the "LongPathsEnabled" value in "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem". Maybe you mean that we can remove the prefix because none of our code will crash on long paths. My basis for suggesting this check was that realpath should return a usable result if possible, not a result that requires converting back to an extended path (including the special case for UNC paths) if long paths aren't enabled.

We can skip the MAX_PATH check if we have a way to check whether long paths are enabled. The only way to directly check this is to call the undocumented function RtlAreLongPathsEnabled, if it's defined in ntdll.dll. We could also check the Windows version number and "LongPathsEnabled" registry value at startup, but it's slightly race-prone since the value may have changed after the system read and cached it at startup. I suppose we could check it indirectly by trying to open a random 255-character name in the temp folder at startup, if that's at least MAX_PATH characters. If it fails with ERROR_FILE_NOT_FOUND then set a flag that long paths are enabled. If long paths aren't enabled I expect the error to be ERROR_PATH_NOT_FOUND.

I was also concerned about sharing paths via IPC and configuration settings with programs that don't support long paths. But now that I think about it more, I don't think extended paths help this case much in general since many programs don't support them. In this case we can use some type of junction, which could be a drive-letter junction via win32file.DefineDosDevice (or equivalently subst.exe) or a filesystem junction via _winapi.CreateJunction (or equivalently CMD's mklink /j). These extend the limit to about 4k characters. It's not the full 32k, but I have no idea who would really need paths that long.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're thinking now.

I'd rather move the path length condition out of this function then, so that:

  • _extended_to_normal always returns a normal path, regardless of length
  • realpath returns a normal path if < 260 chars or an extended path if > 260 chars

The first point is the main part that was bothering me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And _extended_to_normal should also have the prefix check, being a no-op if the path doesn't have it.

@csabella
Copy link
Copy Markdown
Contributor

@vladima, please address @zooba's last code review comments. Thanks!

@zooba
Copy link
Copy Markdown
Member

zooba commented Sep 10, 2019

This no longer applies, but thanks for all the effort you put into the change (and dealing with our feedback)!

@zooba zooba closed this Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants