Skip to content

Commit

Permalink
Merge pull request #1052 from datalad-handbook/bf-977
Browse files Browse the repository at this point in the history
Correct the description of hash-size impact on Windows paths
  • Loading branch information
adswa authored Nov 1, 2023
2 parents 9a3a953 + c74c8d0 commit fa26995
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/intro/windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Windows also has insufficient support for :term:`symlink`\ing and locking files
In addition, Windows has a (default) `maximum path length limitation of only 260 characters <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.
However, DataLad (or rather, :term:`git-annex`) relies on `file content hashing <https://en.wikipedia.org/wiki/Hash_function>`_ to ensure file integrity.
Usually, the *longer* the `hash` that is created, the more fail-safe it is.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 256 characters long hash) as their default.
As git-annex represents files with their content hash as a name, though, a secure 256 character file name is too long for Windows.
Datasets thus adjust this default to a 128 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 64 characters long hash) as their default.
As git-annex represents files with their content hash as their name, and places them into a directory of the same name, half of the total path length is already used up with a ``SHA256`` hash.
Datasets thus adjust this default to a 32 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
You *can* enable long paths in recent builds of Windows 10, `but it requires some tweaking <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.

Windows also doesn't really come with a decent :term:`terminal`.
Expand Down Expand Up @@ -165,7 +165,7 @@ Take a look at user forums such as `forums.linuxmint.com <https://forums.linuxmi

.. [#f1] If you are thinking, "Well, why would you use :term:`Singularity`, :term:`Docker` is available on Windows!": True, and ``datalad-container`` can indeed use Docker. But Docker can only be installed on Windows Pro or Enterprise, but not on Windows Home. Eh. :(
.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 128 character hash value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.
.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 32 character hash digest value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.
.. [#f3] The path length limitation certainly isn't only a problem for DataLad and its underlying tools. Many users run into a Path length related problems at least once, by accident. Downloading or copying files with long names into a folder that itself has a long name, for example, can become an unexpected issue (especially if you are not aware of the limit). Imagine transferring pictures from your friends camera into ``C:\Users\"Bob McBobface"\Desktop\Pictures\"Vacation Pictures"\2020\Saint-Remy-en-Bouzemont-Saint-Genest-et-Isson\"From Alice and Sasha"\Camera\`` -- those file names shouldn't be too long to fit in the limit. Likewise, when ``git clone``\ing a :term:`Git` repository that was created on a Unix computer and contains very long file names could fail.
Expand Down

0 comments on commit fa26995

Please sign in to comment.