bpo-14094: Use _getfinalpathname to implement realpath#11248
bpo-14094: Use _getfinalpathname to implement realpath#11248vladima wants to merge 23 commits intopython:masterfrom
Conversation
zooba
left a comment
There was a problem hiding this comment.
I can't get to bpo right now to check the discussion, but the change looks fine.
|
A missing bit - handling of UNC paths with |
|
I can update this PR with a suggested approach of people don't mind. |
Co-Authored-By: vladima <[email protected]>
| resolved = join(*reversed(resolved_parts)) | ||
| # try to convert extended path to normal if | ||
| # initial path did not use \\?\ prefix and result uses it | ||
| if not is_extended_path and resolved.startswith(extended_length_prefix): |
There was a problem hiding this comment.
Good catch to also check for the extended-path prefix here since all of the _getfinalpathname calls may have failed.
|
is there anything else that should be done for this PR? |
| def realpath(filename): | ||
| filename = os.fspath(filename) | ||
| is_str = isinstance(filename, str) | ||
| filename = os.fsdecode(filename) |
There was a problem hiding this comment.
Let's avoid this call if it's not required:
is_str = isinstance(filename, str)
if not is_str:
filename = os.fsdecode(filename)
@zooba, is this okay?
| is_extended_path = filename.startswith(extended_path_prefix) | ||
| return filename if is_extended_path else abspath(filename) | ||
|
|
||
| def _getfinalpathname(path): |
There was a problem hiding this comment.
I wonder why this fallback was ever defined for non-Windows platforms. Maybe it was a mistake or left over from code that's long gone. I think we don't need this.
There was a problem hiding this comment.
Dropping it is easy however at glance ntpath module is intended to be working even on non-Windows platform - there are a few places that try to gracefully handle missing native implementations:
There was a problem hiding this comment.
_getfullpathname doesn't get defined. _getvolumepathname is set to None to avoid duplicating the definition of ismount for non-Windows platforms, but it should really be a boolean _has_getvolumepathname. (If Python had inline functions I'd write the DRY common part as an inline function. Supporting platforms dynamically at runtime is suboptimal.)
There was a problem hiding this comment.
Ah, I've misunderstood you - assumed you are talking about both fallbacks defined in case if importing _getfinalpathname fails.
|
I think all comments are addressed, is there anything else? |
Feedback from core devs would be appreciated (@vstinner, @serhiy-storchaka, @zooba, @ZackerySpytz, @pfmoore, @tjguk, @zware). The short-name test needs to be hardened to get the short name of the directory via Skip the test if I think more cases need to be tested, including adapting tests from posixpath, if any are relevant.
|
zooba
left a comment
There was a problem hiding this comment.
I'd also like to see more tests. We're covering a lot of edge-cases here now, but not checking them all.
| is_str = isinstance(filename, str) | ||
| if not is_str: | ||
| filename = os.fsdecode(filename) | ||
| extended_path_prefix = '\\\\?\\' |
There was a problem hiding this comment.
Prefer module-level _PRIVATE_CONSTANTS
| genericpath._check_arg_types('commonpath', *paths) | ||
| raise | ||
|
|
||
| MAX_PATH = 260 |
There was a problem hiding this comment.
_INTERNAL_CONSTANT or else document it (as a legacy value that can largely be ignored)
There was a problem hiding this comment.
My later suggestion may remove the need for this constant entirely.
There was a problem hiding this comment.
I think it's useful to document as a public constant, but maybe change it to MAX_PATH_DOS or MAX_PATH_LEGACY? Python still has an inter-operation concern even if our own process supports long paths (in addition to our manifest setting, this requires enabling at the OS level , which last I checked isn't the default setting). Along the same lines, CreateProcess doesn't allow the current directory (explicit or inherited) to be a long path.
There was a problem hiding this comment.
We've never had it in the past, and it's becoming less relevant. I'd rather not add it now, and continue the work we do to handle it as transparently as possible.
| # return path as-is | ||
| return path | ||
|
|
||
| if (len(normal_path) < MAX_PATH and |
There was a problem hiding this comment.
I'd rather not check the length here and just handle the OSError that _getfullpathname raises if the path is too long (which may be 32k instead of 260 on more recent systems)
There was a problem hiding this comment.
That would be convenient, but GetFullPathNameW only has the NT limit for a null-terminated string, UNICODE_STRING_MAX_CHARS - 1 (32766). Someone made a couple of mistakes in the docs. Windows 10 long-path support has no bearing on GetFullPathNameW, and we've never had to use the "\\?\" prefix to support up to 32,766 characters with this function. The only change I recall is that prior to Windows 7 the behavior wasn't well-defined if we passed it a path that exceeded 32,766 characters. I think it would type cast the length in bytes to USHORT. Since Windows 7, it calls the internal function RtlGetFullPathName_UEx, which returns an NTSTATUS code. If the input path length exceeds 32,766, RtlGetFullPathName_UEx fails with STATUS_NAME_TOO_LONG (i.e. ERROR_FILENAME_EXCED_RANGE). If the result would exceed 32,766 characters, it fails with STATUS_OBJECT_NAME_INVALID (i.e. ERROR_INVALID_NAME).
There was a problem hiding this comment.
That would be convenient, but …
I don't understand why you say "but"? It sounds like (and I confirmed from source) all our supported platforms can handle normal and excessively long paths here just fine, so there's no need for the check, correct?
There was a problem hiding this comment.
We're probably talking past each other here. IMO, we shouldn't remove the "\\?\" prefix for long paths prior to Windows 10, and in Windows 10 we shouldn't remove it if long-path support isn't enabled by the "LongPathsEnabled" value in "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem". Maybe you mean that we can remove the prefix because none of our code will crash on long paths. My basis for suggesting this check was that realpath should return a usable result if possible, not a result that requires converting back to an extended path (including the special case for UNC paths) if long paths aren't enabled.
We can skip the MAX_PATH check if we have a way to check whether long paths are enabled. The only way to directly check this is to call the undocumented function RtlAreLongPathsEnabled, if it's defined in ntdll.dll. We could also check the Windows version number and "LongPathsEnabled" registry value at startup, but it's slightly race-prone since the value may have changed after the system read and cached it at startup. I suppose we could check it indirectly by trying to open a random 255-character name in the temp folder at startup, if that's at least MAX_PATH characters. If it fails with ERROR_FILE_NOT_FOUND then set a flag that long paths are enabled. If long paths aren't enabled I expect the error to be ERROR_PATH_NOT_FOUND.
I was also concerned about sharing paths via IPC and configuration settings with programs that don't support long paths. But now that I think about it more, I don't think extended paths help this case much in general since many programs don't support them. In this case we can use some type of junction, which could be a drive-letter junction via win32file.DefineDosDevice (or equivalently subst.exe) or a filesystem junction via _winapi.CreateJunction (or equivalently CMD's mklink /j). These extend the limit to about 4k characters. It's not the full 32k, but I have no idea who would really need paths that long.
There was a problem hiding this comment.
Ah, I see what you're thinking now.
I'd rather move the path length condition out of this function then, so that:
_extended_to_normalalways returns a normal path, regardless of lengthrealpathreturns a normal path if < 260 chars or an extended path if > 260 chars
The first point is the main part that was bothering me.
There was a problem hiding this comment.
And _extended_to_normal should also have the prefix check, being a no-op if the path doesn't have it.
|
This no longer applies, but thanks for all the effort you put into the change (and dealing with our feedback)! |
Per discussion
realpathshould be usingGetFinalPathNameByHandleand it is already exposed via_getfinalpathname.https://bugs.python.org/issue14094