Skip to content

Commit

Permalink
Merge branch 'main' into book-intro
Browse files Browse the repository at this point in the history
  • Loading branch information
mih committed Nov 9, 2023
2 parents bd84718 + cd0a100 commit b0cd830
Show file tree
Hide file tree
Showing 15 changed files with 35 additions and 31 deletions.
1 change: 1 addition & 0 deletions docs/basics/101-101-create.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Note the command structure of :dlcmd:`create` (optional bits are enclosed in ``[
datalad create [--description "..."] [-c <config options>] PATH
.. _createdescription:
.. index::
pair: set description for dataset location; with DataLad
.. find-out-more:: What is the description option of 'datalad create'?
Expand Down
17 changes: 9 additions & 8 deletions docs/basics/101-115-symlinks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ We'll take a look together, using the ``books/`` directory as an example:
.. index::
pair: no symlinks; on Windows
pair: tree; terminal command
.. windows-wit:: This will look different to you
.. windows-wit:: Dataset directories look different on Windows

.. include:: topic/tree-symlinks.rst

Expand Down Expand Up @@ -87,7 +87,7 @@ tree is also known as the *annex* of a dataset.
.. index::
pair: elevated storage demand; in adjusted mode
pair: no symlinks; on Windows
.. windows-wit:: What happens on Windows?
.. windows-wit:: File content management on Windows
:name: woa_objecttree
:float:

Expand Down Expand Up @@ -139,15 +139,15 @@ This comes with two very important advantages:

One, should you have copies of the
same data in different places of your dataset, the symlinks of these files
point to the same place (in order to understand why this is the case, you
will need to read the hidden section at the end of the page).
point to the same place - in order to understand why this is the case, you
will need to read the :find-out-more:`about the object tree <fom-objecttree>`.
Therefore, any amount of copies of a piece of data
is only one single piece of data in your object tree. This, depending on
how much identical file content lies in different parts of your dataset,
can save you much disk space and time.

The second advantage is less intuitive but clear for users familiar with Git.
Small symlinks can be written very very fast when switching :term:`branch`\es, as opposed to copying and deleting huge data files.
Compared to copying and deleting huge data files, small symlinks can be written very very fast, for example, when switching dataset versions, or :term:`branch`\es.

.. gitusernote:: Speedy branch switches

Expand All @@ -168,9 +168,10 @@ work with the paths in the object tree than you or any other human are.
Lastly, understanding that annexed files in your dataset are symlinked
will be helpful to understand how common file system operations such as
moving, renaming, or copying content translate to dataset modifications
in certain situations. Later in this book we will have a section on how
to manage the file system in a DataLad dataset (:ref:`file system`).
in certain situations. Later in this book, the section :ref:`file system`
will take a closer look at that.

.. _objecttree:
.. index::
pair: key; git-annex concept
.. find-out-more:: Paths, checksums, object trees, and data integrity
Expand Down Expand Up @@ -227,7 +228,7 @@ to manage the file system in a DataLad dataset (:ref:`file system`).
consisting of two letters each.
These two letters are derived from the md5sum of the key, and their sole purpose to exist is to avoid issues with too many files in one directory (which is a situation that certain file systems have problems with).
The next subdirectory in the symlink helps to prevent accidental deletions and changes, as it does not have write :term:`permissions`, so that users cannot modify any of its underlying contents.
This is the reason that annexed files need to be unlocked prior to modifications, and this information will be helpful to understand some file system management operations such as removing files or datasets (see section :ref:`file system`).
This is the reason that annexed files need to be unlocked prior to modifications, and this information will be helpful to understand some file system management operations such as removing files or datasets. Section :ref:`file system` takes a look at that.

The next part of the symlink contains the actual hash.
There are different hash functions available.
Expand Down
10 changes: 5 additions & 5 deletions docs/basics/101-116-sharelocal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ DataLad for, if everyone can already access everything?" However,
universal, unrestricted access can easily lead to chaos. DataLad can
help facilitate collaboration without requiring ultimate trust and
reliability of all participants. Essentially, with a shared dataset,
collaborators can look and use your dataset without ever touching it.
collaborators can see and use your dataset without any danger
of undesired, or uncontrolled modification.

To demonstrate how to share a DataLad dataset on a common file system,
we will pretend that your personal computer
Expand All @@ -48,8 +49,8 @@ But as we cannot easily simulate a second user in this book,
for now, you will have to share your dataset with yourself.
This endeavor serves several purposes: For one, you will experience a very easy
way of sharing a dataset. Secondly, it will show you
how a dataset can be obtained from a path (instead of a URL as shown in the section
:ref:`installds`). Thirdly, ``DataLad-101`` is a dataset that can
how a dataset can be obtained from a path, instead of a URL as shown in section
:ref:`installds`. Thirdly, ``DataLad-101`` is a dataset that can
showcase many different properties of a dataset already, but it will
be an additional learning experience to see how the different parts
of the dataset -- text files, larger files, subdatasets,
Expand Down Expand Up @@ -194,8 +195,7 @@ and hostname of your computer. "This", you exclaim, excited about your own reali
pair: set description for dataset location; with DataLad
.. find-out-more:: What is this location, and what if I provided a description?

Back in the very first section of the Basics, :ref:`createDS`, a hidden
section mentioned the ``--description`` option of :dlcmd:`create`.
Back in the very first section of the Basics, :ref:`createDS`, a :ref:`Find-out-more mentioned the '--description' option <createdescription>` of :dlcmd:`create`.
With this option, you can provide a description about the dataset *location*.

The :gitannexcmd:`whereis` command, finally, is where such a description
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-123-config2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This looks neither spectacular nor pretty. Also, it does not follow the ``sectio
organization of the ``.git/config`` file anymore. Instead, there are three lines,
and all of these seem to have something to do with the configuration of git-annex.
There even is one key word that you recognize: MD5E.
If you have read the hidden section in :ref:`symlink`
If you have read the :ref:`Find-out-more on object trees <objecttree>`
you will recognize it as a reference to the type of
key used by git-annex to identify and store file content in the object-tree.
The first row, ``* annex.backend=MD5E``, therefore translates to "Everything in this
Expand Down
1 change: 1 addition & 0 deletions docs/basics/101-130-yodaproject.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ the `Python <https://www.python.org>`__ programming language, you decide
to script your analysis in Python. Delighted, you find out that there is even
a Python API for DataLad's functionality that you can read about in :ref:`a Findoutmore on DataLad in Python<fom-pythonapi>`.

.. _pythonapi:
.. index::
pair: use DataLad API; with Python
.. find-out-more:: DataLad's Python API
Expand Down
4 changes: 2 additions & 2 deletions docs/basics/101-132-advancednesting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ interested in this, checkout the :ref:`dedicated Findoutmore <fom-status>`.
Note that both of these commands return only the ``untracked`` file and not
not the ``modified`` subdataset because we're explicitly querying only the
subdataset for its status.
If you however, as done outside of this hidden section, you want to know about
If you however, as done outside of this Find-out-more, you want to know about
the subdataset record in the superdataset without causing a status query for
the state *within* the subdataset itself, you can also provide an explicit
path to the dataset (without a trailing path separator). This can be used
Expand Down Expand Up @@ -171,7 +171,7 @@ interested in this, checkout the :ref:`dedicated Findoutmore <fom-status>`.
option (especially powerful when combined with ``-f json_pp``). To get a complete overview on what you could do, check out the technical
documentation of :dlcmd:`status` `here <https://docs.datalad.org/en/latest/generated/man/datalad-status.html>`_.

Before we leave this hidden section, lets undo the modification of the subdataset
Before we leave this Find-out-more, lets undo the modification of the subdataset
by removing the untracked file:

.. runrecord:: _examples/DL-101-132-109
Expand Down
4 changes: 2 additions & 2 deletions docs/basics/101-133-containersrun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ For this, we will pull an image from Singularity hub. This image was made
for the online-handbook, and it contains the relevant Python setup for
the analysis. Its recipe lives in the online-handbook's
`resources repository <https://github.com/datalad-handbook/resources>`_.
If you're curious how to create a Singularity image, the hidden
section below has some pointers:
If you're curious how to create a Singularity image, the :find-out-more:`on this topic <fom-container-creation>` has some pointers:

.. index::
pair: build container image; with Singularity
.. find-out-more:: How to make a Singularity image
:name: fom-container-creation

Singularity containers are build from image files, often
called "recipes", that hold a "definition" of the software container and its
Expand Down
8 changes: 4 additions & 4 deletions docs/basics/101-136-filesystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,12 @@ save a change that is marked as a deletion in a
datalad save -m "rename file" oldname newname
Alternatively, there is also a way to save the name change
only using Git tools only, outlined in the following hidden
section. If you are a Git user, you will be very familiar with it.
only using Git tools only, outlined in the :find-out-more:`on faster renaming <fom-gitmv>`. If you are a Git user, you will be very familiar with it.

.. index::
pair: rename file; with Git
.. find-out-more:: Faster renaming with Git tools
:name: fom-gitmv

Git has built-in commands that provide a solution in two steps.

Expand Down Expand Up @@ -757,12 +757,12 @@ use.
Beware of one thing though: If your dataset either is a sibling
or has a sibling with the source being a path, moving or renaming
the dataset will break the linkage between the datasets. This can
be fixed easily though. We can try this in the following hidden
section.
be fixed easily though. We can try this in the :find-out-more:`on adjusting sibling URLs <fom-adjust-sibling-urls>`.

.. index::
pair: move subdataset; with Git
.. find-out-more:: If a renamed/moved dataset is a sibling...
:name: fom-adjust-sibling-urls

As section :ref:`config` explains, each
sibling is registered in ``.git/config`` in a "submodule" section.
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-139-gitlfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Alternatively, to make publication even easier for you, the dataset provider, yo
# afterwards, only datalad push is needed to publish dataset contents and history
$ datalad push --to github
Consumers of your dataset should be able to retrieve files right after cloning the dataset without a ``siblings enable`` command (as shown in the section :ref:`dropbox`), because of the ``autoenable=true`` configuration for the special remote.
Consumers of your dataset should be able to retrieve files right after cloning the dataset without a ``siblings enable`` command, as shown in section :ref:`dropbox`, because of the ``autoenable=true`` configuration for the special remote.
.. index::
pair: drop (LFS); with DataLad
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/topic/adjustedmode-nosymlinks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ While git-annex on Unix-based file operating systems stores data in the annex an

**Why is that?**
Data *needs* to be in the annex for version control and transport logistics -- the annex is able to store all previous versions of the data, and manage the transport to other storage locations if you want to publish your dataset.
But as the :ref:`Findoutmore in this section <fom-objecttree>` will show, the :term:`annex` is a non-human readable tree structure, and data thus also needs to exist in its original location.
But as the :ref:`Findoutmore in this section <fom-objecttree>` shows, the :term:`annex` is a non-human readable tree structure, and data thus also needs to exist in its original location.
Thus, it exists in both places: it has moved into the annex, and copied back into its original location.
Once you edit an annexed file, the most recent version of the file is available in its original location, and past versions are stored and readily available in the annex.
If you reset your dataset to a previous state (as is shown in the section :ref:`history`), the respective version of your data is taken from the annex and copied to replace the newer version, and vice versa.
Expand Down
3 changes: 2 additions & 1 deletion docs/beyond_basics/101-146-providers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ dataset -- lacks a configuration for data access about this server::

However, data access can be configured by
the user if the required authentication and credential type are supported by
DataLad (a list is given in the hidden section below).
DataLad - a list is given in the :find-out-more:`on authentication <fom-provider-auth>`.
With a data access configuration in place, commands such as
:dlcmd:`download-url` or :dlcmd:`addurls` can work with urls
the point to the location of the data to be retrieved, and
Expand All @@ -82,6 +82,7 @@ The following information is needed:
The example below sheds some light one this.

.. find-out-more:: Which authentication and credential types are possible?
:name: fom-provider-auth

When configuring custom data access, credential and authentication type
are required information. Below, we list the most common choices for these fields.
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-147-riastores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,7 @@ procedures.
`the docs <https://git-annex.branchable.com/internals/hashing>`_.
.. [#f3] To re-read about how git-annex's object tree works, check out section
:ref:`symlink`, and pay close attention to the hidden section.
:ref:`symlink`, and pay close attention to the :ref:`Find-out-more on the object tree <objecttree>`.
Additionally, you can find a lot of background information in git-annex's
`documentation <https://git-annex.branchable.com/internals>`_.
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-161-biganalyses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ in size even if they are each small in size.
.. [#f1] FEAT is a software tool for model-based fMRI data analysis and part of of
`FSL <https://fsl.fmrib.ox.ac.uk>`_.
.. [#f2] Read more about DataLad's Python API in the first hidden section in
.. [#f2] Read more about DataLad's Python API in the :ref:`Find-out-more on it <pythonapi>` in
:ref:`yoda_project`.
.. [#f3] Read up on these configurations in the chapter :ref:`chapter_config`.
4 changes: 2 additions & 2 deletions docs/beyond_basics/101-179-gitignore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ or create your own one.
To specify dataset content to be git-ignored, you can either write
a full file name, e.g. ``playlists/my-little-pony-themesongs/Friendship-is-magic.mp3``
into this file, or paths or patterns that make use of globbing, such as
``playlists/my-little-pony-themesongs/*``. The hidden section at the end of this
page contains some general rules for patterns in ``.gitignore`` files. Afterwards,
``playlists/my-little-pony-themesongs/*``. The :find-out-more:`on general rules for patterns in .gitignore files <fom-gitignore>` contains a helpful overview. Afterwards,
you just need to save the file once to your dataset so that it is version controlled.
If you have new content you do not want to track, you can add
new paths or patterns to the file, and save these modifications.
Expand Down Expand Up @@ -120,6 +119,7 @@ ignored! Therefore, a ``.gitignore`` file can give you a space inside of
your dataset to be messy, if you want to be.

.. find-out-more:: Rules for .gitignore files
:name: fom-gitignore

Here are some general rules for the patterns you can put into a ``.gitignore``
file, taken from the book `Pro Git <https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository#_ignoring>`_ :
Expand Down
4 changes: 2 additions & 2 deletions docs/usecases/HCP_dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,10 @@ which it has been aggregated are small in size, and yet provide access to the HC
data for anyone who has valid AWS S3 credentials.

At the end of this step, there is one nested dataset per subject in the HCP data
release. If you are interested in the details of this process, checkout the
hidden section below.
release. If you are interested in the details of this process, checkout the :find-out-more:`on the datasets' generation <fom-hcp>`.

.. find-out-more:: How exactly did the datasets came to be?
:name: fom-hcp

All code and tables necessary to generate the HCP datasets can be found on
GitHub at `github.com/TobiasKadelka/build_hcp <https://github.com/TobiasKadelka/build_hcp>`_.
Expand Down

0 comments on commit b0cd830

Please sign in to comment.