Skip to content

Commit

Permalink
Correct the description of hash-size impact on Windows paths
Browse files Browse the repository at this point in the history
The description is changed to state the correct sizes in characters
(because those are immediately relevant, not the bitsize, even if
directly convertable).

Thanks to @NemoAndrea for reporting!

Closes #977
  • Loading branch information
mih committed Nov 1, 2023
1 parent 9540353 commit c74c8d0
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/intro/windows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ Windows also has insufficient support for :term:`symlink`\ing and locking files
In addition, Windows has a (default) `maximum path length limitation of only 260 characters <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.
However, DataLad (or rather, :term:`git-annex`) relies on `file content hashing <https://en.wikipedia.org/wiki/Hash_function>`_ to ensure file integrity.
Usually, the *longer* the `hash` that is created, the more fail-safe it is.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 256 characters long hash) as their default.
As git-annex represents files with their content hash as a name, though, a secure 256 character file name is too long for Windows.
Datasets thus adjust this default to a 128 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 64 characters long hash) as their default.
As git-annex represents files with their content hash as their name, and places them into a directory of the same name, half of the total path length is already used up with a ``SHA256`` hash.
Datasets thus adjust this default to a 32 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_.
You *can* enable long paths in recent builds of Windows 10, `but it requires some tweaking <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation#enable-long-paths-in-windows-10-version-1607-and-later>`_.

Windows also doesn't really come with a decent :term:`terminal`.
Expand Down Expand Up @@ -165,7 +165,7 @@ Take a look at user forums such as `forums.linuxmint.com <https://forums.linuxmi

.. [#f1] If you are thinking, "Well, why would you use :term:`Singularity`, :term:`Docker` is available on Windows!": True, and ``datalad-container`` can indeed use Docker. But Docker can only be installed on Windows Pro or Enterprise, but not on Windows Home. Eh. :(
.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 128 character hash value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.
.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 <https://en.wikipedia.org/wiki/MD5>`_, a hash function that produces a 32 character hash digest value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers.
.. [#f3] The path length limitation certainly isn't only a problem for DataLad and its underlying tools. Many users run into a Path length related problems at least once, by accident. Downloading or copying files with long names into a folder that itself has a long name, for example, can become an unexpected issue (especially if you are not aware of the limit). Imagine transferring pictures from your friends camera into ``C:\Users\"Bob McBobface"\Desktop\Pictures\"Vacation Pictures"\2020\Saint-Remy-en-Bouzemont-Saint-Genest-et-Isson\"From Alice and Sasha"\Camera\`` -- those file names shouldn't be too long to fit in the limit. Likewise, when ``git clone``\ing a :term:`Git` repository that was created on a Unix computer and contains very long file names could fail.
Expand Down

0 comments on commit c74c8d0

Please sign in to comment.