Skip to content

Commit

Permalink
Add lower bounds for populations' splits, fix documentation about it
Browse files Browse the repository at this point in the history
  • Loading branch information
noscode committed Apr 20, 2024
1 parent 6c61623 commit af34377
Show file tree
Hide file tree
Showing 9 changed files with 246 additions and 77 deletions.
40 changes: 34 additions & 6 deletions docs/source/user_manual/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,18 @@ GADMA requires the following dependencies (`requirements/minimal.txt`):
* Python3
* NumPy (>= 1.2.0)
* Scipy (>= 0.6.0)
* ruamel.yaml
* ruamel.yaml (<0.18.0)
* ``dadi`` (>= 1.7.0)
* ``moments`` (>= 1.0.0)
* ``momi``
* ``moments.LD`` (manual installation of ``moments`` with `--ld-extension` flag)
* ``moments.LD`` (is installed alongside with ``moments``)
* nlopt (for ``dadi``)
* Cython (for ``moments``)
* mpmath (for ``moments``)

To draw demographic models one should also install the following packages (`requirements/minimal.txt`):

* matplotlib (>= 0.98.1)
* matplotlib (>= 0.98.1, <3.5)
* Pillow (>= 4.2.1) - optional
* ``moments`` (>= 1.0.0)

Expand All @@ -51,11 +51,10 @@ To run Bayesian optimization `smac` of specified version is requered (`requireme
``momi`` package sometimes is not installed correctly for Windows and MacOS. If ``momi`` is not available please install it manually following the installation instructions in `momi's manual <https://momi2.readthedocs.io/en/latest/installation.html#>`_.

.. note::
``momentsLD`` - the extension of ``moments``, should be installed manually following the installation instructions in `moments's manual <https://moments.readthedocs.io/en/latest/installation.html#>`_.
``momentsLD`` - the extension of ``moments``, it is installed together with ``moments``.

Getting help for engine installation
------------------------------------

If there are some troubles installing the engine, please, first of all check the table below for the ability to install this engine on your system. You are always welcome to `open an issue <https://github.com/ctlab/GADMA/issues#>`_ on GitHub for getting help.

GADMA has automatic tests on GitHUb for engines on different systems (Linux, Windows, MacOS). The following table indicates (according to our tests) if engine could be installed on specified system:
Expand Down Expand Up @@ -88,7 +87,7 @@ GADMA has automatic tests on GitHUb for engines on different systems (Linux, Win
Installing the latest release
------------------------------

The latest release of GADMA is easily installed via ``pip`` or ``conda`` (``bioconda``):
The latest release of GADMA can be easily installed via ``pip`` or ``conda`` (``bioconda``):

.. code-block:: console
Expand All @@ -114,6 +113,35 @@ or
$ conda install -c bioconda moments
Troubleshooting
---------------

If you experience problems with dependencies, we recommend to create an empty `conda environment <https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#>`_:

.. code-block:: console
$ conda create -n gadma_env python=3.10
$ conda activate gadma_env
It is possible to install versions that are used for testing by downloading file `minimal.txt` from `here <https://github.com/ctlab/GADMA/blob/master/requirements/minimal.txt#>`_ and install requirements using:

.. code-block:: console
$ pip install -r minimal.txt
$ pip install gadma
For **MacOS with M processor** we suggest the following recipe (credit to `@Enricobazzi <https://github.com/ctlab/GADMA/issues/82>`_):

.. code-block:: console
$ pip install git+https://github.com/MomentsLD/moments.git
$ conda install -c conda-forge dadi
$ conda install -c conda-forge scikit-allel
$ pip install gadma
$ pip uninstall ruamel.yaml
$ pip install "ruamel.yaml<0.18.0"
$ pip uninstall matplotlib
$ pip install "matplotlib<3.5"
Manual installation
-----------------------------
Expand Down
7 changes: 5 additions & 2 deletions docs/source/user_manual/set_model/set_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Types of demographic models

GADMA could infer two base types of demographic models:

* `Demographic model with structire <set_model_struct.rst>`__ (up to 3 populations). It is a more flexible model type as dynamics of population size change could be inferred and it has a lot of options for parameters.
* `Demographic model with structure <set_model_struct.rst>`__ (up to 3 populations). It is a more flexible model type as dynamics of population size change could be inferred and it has a lot of options for parameters.

.. admonition:: Related options

Expand All @@ -42,8 +42,11 @@ GADMA could infer two base types of demographic models:
* ``Inbreeding`` infers inbreeding coefficients (only for ``dadi`` engine).
* ``Selection`` infers selection coefficients.
* ``Ancestral size as parameter`` disables multinomial approach of ``dadi`` and ``moments`` engines when ancestral size is inferred implicitly.
* ``Lower bound of first split`` limits lower bound of the most ancient split.
* ``Upper bound of first split`` limits upper bound of the most ancient split.
* ``Upper bound of second split`` limits upper bounds of next to the most ancient split.
* ``Lower bound of second split`` limits lower bound of the next to the most ancient split.
* ``Upper bound of second split`` limits upper bound of the next to the most ancient split.


* `Custom demographic model <set_model_custom.rst>`__. It is a usual user-specified model like in ``dadi``, ``moments`` and other tools for demographic inference. Using such a model will give more control over parameters and could be used for inference of more than 3 populations but is less flexible.

Expand Down
22 changes: 17 additions & 5 deletions docs/source/user_manual/set_model/set_model_struct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,16 +200,28 @@ Split could be set in two ways:
# param file
Split fractions: True # for 1) point
Upper bound of split
_____________________
Upper and lower bounds of splits
________________________________

To limit time of some split one should specify an option in the parameter file. Splits are numbered from the most ancient. So split 1 is split that occurred with the ancient population and split 2 is the next division of the second population (exist only for three populations). There are two appropriate options: ``Upper bound of first split`` and ``Upper bound of second split``.
It is possible to limit time of split events in the demographic model with structure. In order to do that one should specify one or multiple options in the parameter file that refer to lower and upper bounds of split events. Splits are numbered from the most ancient, so split 1 is a split event that occurred with the ancient population and split 2 is the next division of the second population (exist only for three populations). There are three options corresponded to split times: ``Lower bound of first split``, ``Upper bound of first split``, ``Lower bound of second split`` and ``Upper bound of second split``.

One should translate time from years into genetic units, therefore divide it by ``2 * T_g``, where ``T_g`` is time (in years) for one generation. For example, one wants to limit the last split to 2000 years. Time for one generation is estimated as 24 years, then one should specify in the parameter file:
Bounds should be specified in GENERATIONS. In order to translate time from years to generations, divide it by ``T_g``, where ``T_g`` is time (in years) for one generation. For example, assume we want the last split to be between 1000 and 2000 years. Time for one generation is estimated to be 24 years. Therefore we construct the following parameter file:

.. code-block:: none
# param_file
...
Upper bound of second split : 41.666
Lower bound of second split : 41.666
Upper bound of second split : 83.333
...
It is allowed to set any of those four options, just make sure they make sense. It is possible to set only one bound or one lower and one upper bounds for different splits:
.. code-block:: none
# param_file
...
# In that particular case upper bound for the second split exists automatically
Upper bound of first split : 30
Lower bound of second split : 10
...
17 changes: 13 additions & 4 deletions example_params
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# If it is resumed from other directory and output directory
# isn't set, GADMA will add '_resumed' for previous output
# directory.
Output directory: my_example_run
Output directory: my_example_run_2


#!!!
Expand Down Expand Up @@ -224,22 +224,31 @@ Inbreeding: False
# Default: False
Ancestral size as parameter: False

# It is possible to limit the time of splits.
# It is possible to limit the time of splits by bounds' specification.
# Split 1 is the most ancient split.
# !Note that time is in genetic units (2 * time for 1 generation):
# !Note that time is in generations:
# e.g. we want to limit by 150 kya, time for one generation is
# 25 years, then bound will be 150000 / (2*25) = 3000.
# 25 years, then bound will be 150000 / 25 = 6000.
#
# Lower bound for split 1 (in case of 2 or 3 populations).
# Default: None
Lower bound of first split: Null
#
# Upper bound for split 1 (in case of 2 or 3 populations).
# Default: None
Upper bound of first split: Null

# Lower bound for split 2 (in case of 3 populations).
# Default: None
Lower bound of second split: Null
#
# Upper bound for split 2 (in case of 3 populations).
# Default: None
Upper bound of second split: Null




#!!!
# Local optimization.
#
Expand Down
15 changes: 12 additions & 3 deletions gadma/cli/params_template
Original file line number Diff line number Diff line change
Expand Up @@ -224,22 +224,31 @@ Inbreeding :
# Default: False
Ancestral size as parameter :

# It is possible to limit the time of splits.
# It is possible to limit the time of splits by bounds' specification.
# Split 1 is the most ancient split.
# !Note that time is in genetic units (2 * time for 1 generation):
# !Note that time is in generations:
# e.g. we want to limit by 150 kya, time for one generation is
# 25 years, then bound will be 150000 / (2*25) = 3000.
# 25 years, then bound will be 150000 / 25 = 6000.
#
# Lower bound for split 1 (in case of 2 or 3 populations).
# Default: None
Lower bound of first split :
#
# Upper bound for split 1 (in case of 2 or 3 populations).
# Default: None
Upper bound of first split :

# Lower bound for split 2 (in case of 3 populations).
# Default: None
Lower bound of second split :
#
# Upper bound for split 2 (in case of 3 populations).
# Default: None
Upper bound of second split :




#!!!
# Local optimization.
#
Expand Down
3 changes: 3 additions & 0 deletions gadma/cli/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,10 @@

# Time bounds
upper_bound_of_first_split = None
lower_bound_of_first_split = None
upper_bound_of_second_split = None
lower_bound_of_second_split = None


# Glocal optimizer
global_optimizer = "Genetic_algorithm"
Expand Down
121 changes: 111 additions & 10 deletions gadma/cli/settings_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,9 @@ class SettingsStorage(object):
_float_attrs = ['theta0', 'time_for_generation', 'eps',
'const_of_time_in_drawing', 'vmin', 'min_n', 'max_n',
'min_t', 'max_t', 'min_m', 'max_m',
'lower_bound_of_first_split',
'upper_bound_of_first_split',
'lower_bound_of_second_split',
'upper_bound_of_second_split',
'const_for_mutation_strength',
'const_for_mutation_rate',
Expand Down Expand Up @@ -575,7 +577,30 @@ def __setattr__(self, name, value):
continue
if len(getattr(self, attr_name)) != value:
raise ValueError(f"Length of {attr_name} should be equal "
f"to {self.number_of_populations}.")
f"to {value}.")
if value < 2:
if self.lower_bound_of_first_split is not None:
raise ValueError(
"Do not specify `Lower bound of first split` as there"
f" is no split for 1 population"
)
if self.upper_bound_of_first_split is not None:
raise ValueError(
"Do not specify `Upper bound of first split` as there"
f" is no split for 1 population"
)
if value < 3:
if self.lower_bound_of_second_split is not None:
raise ValueError(
"Do not specify `Lower bound of second split` as there"
f" is no second split for {value} populations"
)
if self.upper_bound_of_second_split is not None:
raise ValueError(
"Do not specify `Upper bound of second split` as there"
f" is no second split for {value} populations"
)

# 3.7 If we set fractions or size of generation then we create/update
# GA options that depend on these values.
elif name == 'fractions' or name == 'size_of_generation':
Expand Down Expand Up @@ -1240,24 +1265,45 @@ def get_engine_args(self, engine_id=None):

def get_linear_constrain_for_model(self, model):
"""
Returns linear constrain for model based of setted upper bound of
splits. NOT WORKING.
Returns linear constrain for model based of setted upper and lower
bounds of splits.
"""
if (self.upper_bound_of_first_split is None and
self.upper_bound_of_second_split is None):
self.lower_bound_of_first_split is None and
self.upper_bound_of_second_split is None and
self.lower_bound_of_second_split is None):
return None
A = list()
ub = list()
if (self.upper_bound_of_first_split is not None):
lb = list()
# check for the first split
if (self.upper_bound_of_first_split is not None or
self.lower_bound_of_first_split is not None):
A1, b1 = model.get_involved_for_split_time_vars(1)
A.append(A1)
ub.append(self.upper_bound_of_first_split / 2 - b1)
if (self.upper_bound_of_second_split is not None):
if self.upper_bound_of_first_split is not None:
# we divide by 2 as time is in 2N units and we aware of this 2
ub.append(self.upper_bound_of_first_split / 2 - b1)
else:
ub.append(np.inf)
if self.lower_bound_of_first_split is not None:
lb.append(self.lower_bound_of_first_split / 2 - b1)
else:
lb.append(-np.inf)
# check for the first split
if (self.upper_bound_of_second_split is not None or
self.lower_bound_of_second_split is not None):
A2, b2 = model.get_involved_for_split_time_vars(2)
A.append(A2)
ub.append(self.upper_bound_of_second_split / 2 - b2)
lb = - np.inf * np.ones(len(ub))
return LinearConstrain(np.array(A), np.array(lb), np.array(ub))
if self.upper_bound_of_second_split is not None:
ub.append(self.upper_bound_of_second_split / 2 - b2)
else:
ub.append(np.inf)
if self.lower_bound_of_second_split is not None:
lb.append(self.lower_bound_of_second_split / 2 - b2)
else:
lb.append(-np.inf)
return LinearConstrain(A, lb, ub)

def get_model(self):
"""
Expand Down Expand Up @@ -1687,6 +1733,61 @@ def is_valid(self):
labels) = check_and_return_projections_and_labels(
self.data_holder)

# check for lower and upper bounds of population splits
# Check that bounds are specified when they are allowed
if (hasattr(self, "number_of_populations") and
self.number_of_populations < 2):
if self.lower_bound_of_first_split is not None:
raise ValueError(
"Do not specify `Lower bound of first split` as there is"
f" no split for 1 population"
)
if self.upper_bound_of_first_split is not None:
raise ValueError(
"Do not specify `Upper bound of first split` as there is"
f" no split for 1 population"
)

if (hasattr(self, "number_of_populations") and
self.number_of_populations < 3):
if self.lower_bound_of_second_split is not None:
raise ValueError(
"Do not specify `Lower bound of second split` as there"
f" is no second split for {self.number_of_populations}"
" populations"
)
if self.upper_bound_of_second_split is not None:
raise ValueError(
"Do not specify `Upper bound of second split` as there"
f" is no second split for {self.number_of_populations}"
" populations"
)
# Check that values are not contraversal
if (self.lower_bound_of_first_split is not None and
self.upper_bound_of_first_split is not None):
if (self.lower_bound_of_first_split >
self.upper_bound_of_first_split):
raise ValueError(
"`Lower bound of first split` should be less than `Upper"
" bound of first split`"
)
if (self.lower_bound_of_second_split is not None and
self.upper_bound_of_second_split is not None):
if (self.lower_bound_of_second_split >
self.upper_bound_of_second_split):
raise ValueError(
"`Lower bound of second split` should be less than `Upper"
" bound of second split`"
)
if (self.upper_bound_of_first_split is not None and
self.lower_bound_of_second_split is not None):
if (self.upper_bound_of_first_split <
self.lower_bound_of_second_split):
raise ValueError(
"`Lower bound of second split` should be less than `Upper"
" bound of first split`"
)

def print_citations(self):
"""
Prints neccessary citations according to current settings.
Expand Down
Loading

0 comments on commit af34377

Please sign in to comment.