Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the description of hash-size impact on Windows paths #1052

Merged
merged 1 commit into from
Nov 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/intro/windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Windows also has insufficient support for :term:`symlink`\ing and locking files
In addition, Windows has a (default) `maximum path length limitation of only 260 characters <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.
However, DataLad (or rather, :term:`git-annex`) relies on `file content hashing <https://en.wikipedia.org/wiki/Hash_function>`_ to ensure file integrity.
Usually, the *longer* the `hash` that is created, the more fail-safe it is.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 256 characters long hash) as their default.
As git-annex represents files with their content hash as a name, though, a secure 256 character file name is too long for Windows.
Datasets thus adjust this default to a 128 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 64 characters long hash) as their default.
As git-annex represents files with their content hash as their name, and places them into a directory of the same name, half of the total path length is already used up with a ``SHA256`` hash.
Datasets thus adjust this default to a 32 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
You *can* enable long paths in recent builds of Windows 10, `but it requires some tweaking <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.

Windows also doesn't really come with a decent :term:`terminal`.
Expand Down Expand Up @@ -165,7 +165,7 @@ Take a look at user forums such as `forums.linuxmint.com <https://forums.linuxmi

.. [#f1] If you are thinking, "Well, why would you use :term:`Singularity`, :term:`Docker` is available on Windows!": True, and ``datalad-container`` can indeed use Docker. But Docker can only be installed on Windows Pro or Enterprise, but not on Windows Home. Eh. :(

.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 128 character hash value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.
.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 32 character hash digest value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.

.. [#f3] The path length limitation certainly isn't only a problem for DataLad and its underlying tools. Many users run into a Path length related problems at least once, by accident. Downloading or copying files with long names into a folder that itself has a long name, for example, can become an unexpected issue (especially if you are not aware of the limit). Imagine transferring pictures from your friends camera into ``C:\Users\"Bob McBobface"\Desktop\Pictures\"Vacation Pictures"\2020\Saint-Remy-en-Bouzemont-Saint-Genest-et-Isson\"From Alice and Sasha"\Camera\`` -- those file names shouldn't be too long to fit in the limit. Likewise, when ``git clone``\ing a :term:`Git` repository that was created on a Unix computer and contains very long file names could fail.

Expand Down