Add lower bounds for populations' splits, fix documentation about it

ctlab · Apr 20, 2024 · af34377 · af34377
1 parent 6c61623
commit af34377
Show file tree

Hide file tree

Showing 9 changed files with 246 additions and 77 deletions.
diff --git a/docs/source/user_manual/installation.rst b/docs/source/user_manual/installation.rst
@@ -17,18 +17,18 @@ GADMA requires the following dependencies (`requirements/minimal.txt`):
 * Python3
 * NumPy (>= 1.2.0)
 * Scipy (>= 0.6.0)
-* ruamel.yaml
+* ruamel.yaml (<0.18.0)
 * ``dadi`` (>= 1.7.0)
 * ``moments`` (>= 1.0.0)
 * ``momi``
-* ``moments.LD`` (manual installation of ``moments`` with `--ld-extension` flag)
+* ``moments.LD`` (is installed alongside with ``moments``)
 * nlopt (for ``dadi``)
 * Cython (for ``moments``)
 * mpmath (for ``moments``)
 
 To draw demographic models one should also install the following packages (`requirements/minimal.txt`):
 
-* matplotlib (>= 0.98.1)
+* matplotlib (>= 0.98.1, <3.5)
 * Pillow (>= 4.2.1) - optional
 * ``moments`` (>= 1.0.0)
 
@@ -51,11 +51,10 @@ To run Bayesian optimization `smac` of specified version is requered (`requireme
     ``momi`` package sometimes is not installed correctly for Windows and MacOS. If ``momi`` is not available please install it manually following the installation instructions in `momi's manual <https://momi2.readthedocs.io/en/latest/installation.html#>`_.
 
 .. note::
-    ``momentsLD`` - the extension of ``moments``, should be installed manually  following the installation instructions in `moments's manual <https://moments.readthedocs.io/en/latest/installation.html#>`_.
+    ``momentsLD`` - the extension of ``moments``, it is installed together with ``moments``.
 
 Getting help for engine installation
 ------------------------------------
-
 If there are some troubles installing the engine, please, first of all check the table below for the ability to install this engine on your system. You are always welcome to `open an issue <https://github.com/ctlab/GADMA/issues#>`_ on GitHub for getting help.
 
 GADMA has automatic tests on GitHUb for engines on different systems (Linux, Windows, MacOS). The following table indicates (according to our tests) if engine could be installed on specified system:
@@ -88,7 +87,7 @@ GADMA has automatic tests on GitHUb for engines on different systems (Linux, Win
 Installing the latest release
 ------------------------------
 
-The latest release of GADMA is easily installed via ``pip`` or ``conda`` (``bioconda``):
+The latest release of GADMA can be easily installed via ``pip`` or ``conda`` (``bioconda``):
 
 .. code-block:: console
 
@@ -114,6 +113,35 @@ or
 
         $ conda install -c bioconda moments
 
+Troubleshooting
+---------------
+
+If you experience problems with dependencies, we recommend to create an empty `conda environment <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#>`_:
+
+.. code-block:: console
+
+    $ conda create -n gadma_env python=3.10
+    $ conda activate gadma_env
+
+It is possible to install versions that are used for testing by downloading file `minimal.txt` from `here <https://github.com/ctlab/GADMA/blob/master/requirements/minimal.txt#>`_ and install requirements using:
+
+.. code-block:: console
+
+    $ pip install -r minimal.txt
+    $ pip install gadma
+
+For **MacOS with M processor** we suggest the following recipe (credit to `@Enricobazzi <https://github.com/ctlab/GADMA/issues/82>`_):
+
+.. code-block:: console
+
+    $ pip install git+https://github.com/MomentsLD/moments.git
+    $ conda install -c conda-forge dadi
+    $ conda install -c conda-forge scikit-allel
+    $ pip install gadma
+    $ pip uninstall ruamel.yaml
+    $ pip install "ruamel.yaml<0.18.0"
+    $ pip uninstall matplotlib
+    $ pip install "matplotlib<3.5"
 
 Manual installation
 -----------------------------

diff --git a/docs/source/user_manual/set_model/set_model.rst b/docs/source/user_manual/set_model/set_model.rst
@@ -23,7 +23,7 @@ Types of demographic models
 
 GADMA could infer two base types of demographic models:
 
-* `Demographic model with structire <set_model_struct.rst>`__ (up to 3 populations). It is a more flexible model type as dynamics of population size change could be inferred and it has a lot of options for parameters.
+* `Demographic model with structure <set_model_struct.rst>`__ (up to 3 populations). It is a more flexible model type as dynamics of population size change could be inferred and it has a lot of options for parameters.
 
 .. admonition:: Related options
 
@@ -42,8 +42,11 @@ GADMA could infer two base types of demographic models:
       * ``Inbreeding`` infers inbreeding coefficients (only for ``dadi`` engine).
       * ``Selection`` infers selection coefficients.
       * ``Ancestral size as parameter`` disables multinomial approach of ``dadi`` and ``moments`` engines when ancestral size is inferred implicitly.
+      * ``Lower bound of first split`` limits lower bound of the most ancient split.
       * ``Upper bound of first split`` limits upper bound of the most ancient split.
-      * ``Upper bound of second split`` limits upper bounds of next to the most ancient split.
+      * ``Lower bound of second split`` limits lower bound of the next to the most ancient split.
+      * ``Upper bound of second split`` limits upper bound of the next to the most ancient split.
+
 
 * `Custom demographic model <set_model_custom.rst>`__. It is a usual user-specified model like in ``dadi``, ``moments`` and other tools for demographic inference. Using such a model will give more control over parameters and could be used for inference of more than 3 populations but is less flexible.
 

diff --git a/docs/source/user_manual/set_model/set_model_struct.rst b/docs/source/user_manual/set_model/set_model_struct.rst
@@ -200,16 +200,28 @@ Split could be set in two ways:
     # param file
     Split fractions: True  # for 1) point
 
-Upper bound of split
-_____________________
+Upper and lower bounds of splits
+________________________________
 
-To limit time of some split one should specify an option in the parameter file. Splits are numbered from the most ancient. So split 1 is split that occurred with the ancient population and split 2 is the next division of the second population (exist only for three populations). There are two appropriate options: ``Upper bound of first split`` and ``Upper bound of second split``.
+It is possible to limit time of split events in the demographic model with structure. In order to do that one should specify one or multiple options in the parameter file that refer to lower and upper bounds of split events. Splits are numbered from the most ancient, so split 1 is a split event that occurred with the ancient population and split 2 is the next division of the second population (exist only for three populations). There are three options corresponded to split times: ``Lower bound of first split``, ``Upper bound of first split``, ``Lower bound of second split``  and ``Upper bound of second split``.
 
-One should translate time from years into genetic units, therefore divide it by ``2 * T_g``, where ``T_g`` is time (in years) for one generation. For example, one wants to limit the last split to 2000 years. Time for one generation is estimated as 24 years, then one should specify in the parameter file:
+Bounds should be specified in GENERATIONS. In order to translate time from years to generations, divide it by ``T_g``, where ``T_g`` is time (in years) for one generation. For example, assume we want the last split to be between 1000 and 2000 years. Time for one generation is estimated to be 24 years. Therefore we construct the following parameter file:
 
 .. code-block:: none
 
     # param_file
     ...
-    Upper bound of second split : 41.666
+    Lower bound of second split : 41.666
+    Upper bound of second split : 83.333
+    ...
+
+ It is allowed to set any of those four options, just make sure they make sense. It is possible to set only one bound or one lower and one upper bounds for different splits:
+
+ .. code-block:: none
+
+    # param_file
+    ...
+    # In that particular case upper bound for the second split exists automatically
+    Upper bound of first split : 30
+    Lower bound of second split : 10
     ...
diff --git a/example_params b/example_params
@@ -12,7 +12,7 @@
 #    If it is resumed from other directory and output directory 
 #    isn't set, GADMA will add '_resumed' for previous output 
 #    directory.
-Output directory: my_example_run
+Output directory: my_example_run_2
 
 
 #!!!
@@ -224,22 +224,31 @@ Inbreeding: False
 #   Default: False
 Ancestral size as parameter: False
 
-#    It is possible to limit the time of splits.
+#    It is possible to limit the time of splits by bounds' specification.
 #    Split 1 is the most ancient split.
-#    !Note that time is in genetic units (2 * time for 1 generation):
+#    !Note that time is in generations:
 #    e.g. we want to limit by 150 kya, time for one generation is 
-#    25 years, then bound will be 150000 / (2*25) = 3000.
+#    25 years, then bound will be 150000 / 25 = 6000.
+#
+#    Lower bound for split 1 (in case of 2 or 3 populations).
+#    Default: None
+Lower bound of first split: Null
 #
 #    Upper bound for split 1 (in case of 2 or 3 populations).
 #    Default: None
 Upper bound of first split: Null
 
+#    Lower bound for split 2 (in case of 3 populations).
+#    Default: None
+Lower bound of second split: Null
+#
 #    Upper bound for split 2 (in case of 3 populations).
 #    Default: None
 Upper bound of second split: Null
 
 
 
+
 #!!!
 #    Local optimization.
 #

diff --git a/gadma/cli/params_template b/gadma/cli/params_template
@@ -224,22 +224,31 @@ Inbreeding :
 #   Default: False
 Ancestral size as parameter : 
 
-#    It is possible to limit the time of splits.
+#    It is possible to limit the time of splits by bounds' specification.
 #    Split 1 is the most ancient split.
-#    !Note that time is in genetic units (2 * time for 1 generation):
+#    !Note that time is in generations:
 #    e.g. we want to limit by 150 kya, time for one generation is 
-#    25 years, then bound will be 150000 / (2*25) = 3000.
+#    25 years, then bound will be 150000 / 25 = 6000.
+#
+#    Lower bound for split 1 (in case of 2 or 3 populations).
+#    Default: None
+Lower bound of first split : 
 #
 #    Upper bound for split 1 (in case of 2 or 3 populations).
 #    Default: None
 Upper bound of first split : 
 
+#    Lower bound for split 2 (in case of 3 populations).
+#    Default: None
+Lower bound of second split : 
+#
 #    Upper bound for split 2 (in case of 3 populations).
 #    Default: None
 Upper bound of second split : 
 
 
 
+
 #!!!
 #    Local optimization.
 #

diff --git a/gadma/cli/settings.py b/gadma/cli/settings.py
@@ -66,7 +66,10 @@
 
 # Time bounds
 upper_bound_of_first_split = None
+lower_bound_of_first_split = None
 upper_bound_of_second_split = None
+lower_bound_of_second_split = None
+
 
 # Glocal optimizer
 global_optimizer = "Genetic_algorithm"

diff --git a/gadma/cli/settings_storage.py b/gadma/cli/settings_storage.py
@@ -198,7 +198,9 @@ class SettingsStorage(object):
     _float_attrs = ['theta0', 'time_for_generation', 'eps',
                     'const_of_time_in_drawing', 'vmin', 'min_n', 'max_n',
                     'min_t', 'max_t', 'min_m', 'max_m',
+                    'lower_bound_of_first_split',
                     'upper_bound_of_first_split',
+                    'lower_bound_of_second_split',
                     'upper_bound_of_second_split',
                     'const_for_mutation_strength',
                     'const_for_mutation_rate',
@@ -575,7 +577,30 @@ def __setattr__(self, name, value):
                     continue
                 if len(getattr(self, attr_name)) != value:
                     raise ValueError(f"Length of {attr_name} should be equal "
-                                     f"to {self.number_of_populations}.")
+                                     f"to {value}.")
+            if value < 2:
+                if self.lower_bound_of_first_split is not None:
+                    raise ValueError(
+                        "Do not specify `Lower bound of first split` as there"
+                        f" is no split for 1 population"
+                    )
+                if self.upper_bound_of_first_split is not None:
+                    raise ValueError(
+                        "Do not specify `Upper bound of first split` as there"
+                        f" is no split for 1 population"
+                    )
+            if value < 3:
+                if self.lower_bound_of_second_split is not None:
+                    raise ValueError(
+                        "Do not specify `Lower bound of second split` as there"
+                        f" is no second split for {value} populations"
+                    )
+                if self.upper_bound_of_second_split is not None:
+                    raise ValueError(
+                        "Do not specify `Upper bound of second split` as there"
+                        f" is no second split for {value} populations"
+                    )
+
         # 3.7 If we set fractions or size of generation then we create/update
         # GA options that depend on these values.
         elif name == 'fractions' or name == 'size_of_generation':
@@ -1240,24 +1265,45 @@ def get_engine_args(self, engine_id=None):
 
     def get_linear_constrain_for_model(self, model):
         """
-        Returns linear constrain for model based of setted upper bound of
-        splits. NOT WORKING.
+        Returns linear constrain for model based of setted upper and lower
+        bounds of splits.
         """
         if (self.upper_bound_of_first_split is None and
-                self.upper_bound_of_second_split is None):
+                self.lower_bound_of_first_split is None and
+                self.upper_bound_of_second_split is None and
+                self.lower_bound_of_second_split is None):
             return None
         A = list()
         ub = list()
-        if (self.upper_bound_of_first_split is not None):
+        lb = list()
+        # check for the first split
+        if (self.upper_bound_of_first_split is not None or
+                self.lower_bound_of_first_split is not None):
             A1, b1 = model.get_involved_for_split_time_vars(1)
             A.append(A1)
-            ub.append(self.upper_bound_of_first_split / 2 - b1)
-        if (self.upper_bound_of_second_split is not None):
+            if self.upper_bound_of_first_split is not None:
+                # we divide by 2 as time is in 2N units and we aware of this 2
+                ub.append(self.upper_bound_of_first_split / 2 - b1)
+            else:
+                ub.append(np.inf)
+            if self.lower_bound_of_first_split is not None:
+                lb.append(self.lower_bound_of_first_split / 2 - b1)
+            else:
+                lb.append(-np.inf)
+        # check for the first split
+        if (self.upper_bound_of_second_split is not None or
+                self.lower_bound_of_second_split is not None):
             A2, b2 = model.get_involved_for_split_time_vars(2)
             A.append(A2)
-            ub.append(self.upper_bound_of_second_split / 2 - b2)
-        lb = - np.inf * np.ones(len(ub))
-        return LinearConstrain(np.array(A), np.array(lb), np.array(ub))
+            if self.upper_bound_of_second_split is not None:
+                ub.append(self.upper_bound_of_second_split / 2 - b2)
+            else:
+                ub.append(np.inf)
+            if self.lower_bound_of_second_split is not None:
+                lb.append(self.lower_bound_of_second_split / 2 - b2)
+            else:
+                lb.append(-np.inf)
+        return LinearConstrain(A, lb, ub)
 
     def get_model(self):
         """
@@ -1687,6 +1733,61 @@ def is_valid(self):
                      labels) = check_and_return_projections_and_labels(
                         self.data_holder)
 
+        # check for lower and upper bounds of population splits
+        # Check that bounds are specified when they are allowed
+        if (hasattr(self, "number_of_populations") and
+                self.number_of_populations < 2):
+            if self.lower_bound_of_first_split is not None:
+                raise ValueError(
+                    "Do not specify `Lower bound of first split` as there is"
+                    f" no split for 1 population"
+                )
+            if self.upper_bound_of_first_split is not None:
+                raise ValueError(
+                    "Do not specify `Upper bound of first split` as there is"
+                    f" no split for 1 population"
+                )
+
+        if (hasattr(self, "number_of_populations") and
+                self.number_of_populations < 3):
+            if self.lower_bound_of_second_split is not None:
+                raise ValueError(
+                    "Do not specify `Lower bound of second split` as there"
+                    f" is no second split for {self.number_of_populations}"
+                    " populations"
+                )
+            if self.upper_bound_of_second_split is not None:
+                raise ValueError(
+                    "Do not specify `Upper bound of second split` as there"
+                    f" is no second split for {self.number_of_populations}"
+                    " populations"
+                )
+        # Check that values are not contraversal
+        if (self.lower_bound_of_first_split is not None and
+                self.upper_bound_of_first_split is not None):
+            if (self.lower_bound_of_first_split >
+                    self.upper_bound_of_first_split):
+                raise ValueError(
+                    "`Lower bound of first split` should be less than `Upper"
+                    " bound of first split`"
+                )
+        if (self.lower_bound_of_second_split is not None and
+                self.upper_bound_of_second_split is not None):
+            if (self.lower_bound_of_second_split >
+                    self.upper_bound_of_second_split):
+                raise ValueError(
+                    "`Lower bound of second split` should be less than `Upper"
+                    " bound of second split`"
+                )
+        if (self.upper_bound_of_first_split is not None and
+                self.lower_bound_of_second_split is not None):
+            if (self.upper_bound_of_first_split <
+                    self.lower_bound_of_second_split):
+                raise ValueError(
+                    "`Lower bound of second split` should be less than `Upper"
+                    " bound of first split`"
+                )
+
     def print_citations(self):
         """
         Prints neccessary citations according to current settings.