Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facilitating 3 population model convergence #86

Open
steigeec opened this issue Sep 27, 2023 · 12 comments
Open

facilitating 3 population model convergence #86

steigeec opened this issue Sep 27, 2023 · 12 comments

Comments

@steigeec
Copy link

steigeec commented Sep 27, 2023

Hi, Ekaterina -

Thanks again for this incredibly useful program!

I had a great experience using for 2-population models. Ultimately, I want to implement for a 3-population model, but GADMA has not been able to print any models yet (192 individuals total, and 6.7million SNPs). To facilitate convergence, I changed my final structure to match the initial structure: [1,1,1], but GADMA has been running 30 processes for a month now and hasn't been able to print a model. Do you recommend that I implement the Bayesian optimization ensemble that is shown in the example of inference with four and five populations? Or perhaps I should consider downprojecting my data? Might you have other recommendations for how to assist in convergence?

Thanks so much!

@noscode
Copy link
Collaborator

noscode commented Sep 28, 2023

Hi, thank you for using GADMA!

Inferencing the demographic history for three populations can be time-consuming. Do you have a total of 192 diploid individuals? If so, then it will be time-consuming indeed. You can check the processing speed in any of the following files: output_dir/N/eval_file, where N is the run number. Each line in this file corresponds to one evaluation of log-likelihood, which can give you an idea of the processing speed.

You have two options (as you have already mentioned):

  1. Use Bayesian optimization. By checking the eval_file, you can estimate how long it will take to evaluate 300 or 400 log-likelihoods, as this is the recommended number of evaluations required for Bayesian optimization. Since you've been running GADMA for a month and have no models yet, it might still be quite slow.
  2. Downsample the Site Frequency Spectrum (SFS) using easySFS. If your SFS is still larger than 30x30x30 after downsampling, I would recommend using Bayesian optimization. If you have a smaller SFS, you can try a genetic algorithm.

If you are using dadi, not moments, then you should use Bayesian optimization for three populations in almost any case.
I can assist you in any option you choose.

@steigeec
Copy link
Author

Thank you, Ekaterina, for your very helpful response!

I have decided to both downsample the SFS and use Bayesian optimization. I just wanted to follow up with you as I have struggled getting optimization to work.

Manual GADMA installation has not worked for me in the past, just trying to get the correct combination of dependencies together. For my work with GADMA thus far, I have used a conda installation. However, my conda installation doesn't have access to the Bayesian optimization algorithm as currently installed:
ValueError: Optimizer 'SMAC_BO_combination' is not registered
I have started a fresh conda environment, installing the versions of those modules listed in minimal.txt and bayes_opt.txt, but I haven't found a way to install the required modules without incompatibilities arising. Do you have a recommended complete conda installation command which includes all necessary modules for running GADMA with the dadi engine, using Bayesian optimization?

@noscode
Copy link
Collaborator

noscode commented Oct 27, 2023

Hi,

I am glad to hear from you. From my experience, it can be difficult to install specific versions of packages using conda - as I remember it force you to use the last version. However, if it suits you, you can install all required versions (bayes_opt.txt) using pip. Python should be able to use all packages that are installed either by conda or pip. However, make sure that the fresh versions you installed with conda are uninstalled, otherwise there can be a conflict. I hope that helps.

The last thing: what OS are you using? Just in case: SMAC that is required for Bayesian optimization is not working for Windows.

Best regards,
Ekaterina

@steigeec
Copy link
Author

steigeec commented Nov 3, 2023

Hi, Ekaterina-

Thanks so much!
I am on Ubuntu. I've definitely tried pip along the way. It seems that conda is the most promising path for installation on my system, though, given all the problems I've had with incompatibilities. I'm thinking that GADMA may have some current compatibility issues with numpy? If I install gadma using

mamba create -n DADI -c conda-forge -c bioconda -c anaconda python=3.8 --file gadma_reqs.txt

where gadma_reqs.txt is

setuptools_scm
numpy
scipy
matplotlib
matplotlib
Pillow
ruamel.yaml
mpmath
Cython
networkx
h5py
scikit-allel
pandas
dadi
scikit-optimize
configspace
scikit-learn
smac

then

mamba activate DADI
gadma --test

I get a numpy error that other installation approaches have also been giving me in these recent attempts:

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Thank you for being so responsive and helpful!!

My best,
Emma

@steigeec
Copy link
Author

steigeec commented Dec 1, 2023

Hi, Ekaterina-

I just thought I'd follow up on this question!
Do you think that a numpy incompatibility issue with the current GADMA release indeed might be to blame here?

Thanks so much,
Emma

@noscode
Copy link
Collaborator

noscode commented Dec 7, 2023

Dear Emma,

Thank you for reminding me about your issue, sorry for slow reply. Yes, it is definitely numpy version that causes the error you see. I can recommend to uninstall numpy from conda and install it manually using pip and specific version. For example, I tested numpy 1.22.4 and it worked fine. To install specific version:
pip install numpy==1.22.4

I will also try to fix this error in the next release.

Best regards,
Ekaterina

@steigeec
Copy link
Author

Hi, Ekaterina! -

Thanks so much for your suggestion. I have tried installation pathways through both pip and conda, and still am unable to complete the installation of GADMA with Bayesian optimization. In my previous installation of GADMA (before I was using Bayesian optimization), I really struggled with installation via pip, and eventually overcame the difficulty by relying entirely on conda. Now, layering on the requirements for Bayesian optimization, conda is also not delivering!

The continual problem that appears to come up is numpy. I have tried both specifying particular versions (and I've tried many possible versions), but also tried not specifying in the hopes that either mamba or conda will be able to detect and resolve incompatibilities during installation.

Something that would really be lovely would be to have a conda/mamba compatibility requirements file with all the exact versions required to play together nicely for GADMA run with the dadi engine, the Bayesian optimization algorithm, and moments (to allow visualization). This is what I've been attempting to compile as I try different combinations of module versions... I did initially try putting together the various requirement files from the GADMA github install, but numpy (as ever) causes problems.

I would be so grateful for any support you can provide! I have come back to this challenge every few weeks, hoping that I'd be newly inspired to solve this puzzle, but I have had no luck since my initial install woes in November!

My very best,
Emma

@noscode
Copy link
Collaborator

noscode commented Mar 25, 2024

Dear Emma,

I am sure that together we can solve this problem! Let us check the following things:

  • GADMA works with smac==0.13.1. The new versions of smac are quite different and cause a lot of issues. You can check version of smac in python's command line the following way:
>>> import smac
>>> smac.__version__

What version of smac do you have?

  • Check version of numpy. Sometimes when the same package is installed using different resources (like conda and pip), they can collapse. You can check the version of numpy the same way as for smac. If you see an unexpected version, then I recommend removing all versions of numpy (repeat pip uninstall numpy and conda uninstall numpy for as many times as required) and install it once using pip.

I am sorry, I am not familiar with mamba, I usually use conda environments where it is allowed to install packages both using conda and pip. Is it also allowed in mamba?

I am looking forward to hear from you!

Best regards,
Ekaterina

@steigeec
Copy link
Author

steigeec commented Mar 28, 2024

Hi, Ekatarina -

Thanks so much for your support. I can't wait to get things installed properly!

Mamba does let you install things using both it and pip! To follow up on your suggestions, what I first did was pip uninstall any versions of dependencies for GADMA currently on my system. I then tried specifying the versions of numpy and smac during a fresh mamba install, to make sure I end up with the versions we know should work (numpy=1.22.4, smac=0.13.1). Then, I downloaded GADMA with git and installed GADMA itself with the pip install . command. Everything appeared to install correctly, but then would fail during the test command. I realized that when I pip installed GADMA, the versions of numpy and dadi I had specified were being overwritten.

Next, I tried setting up all my dependencies with conda again, but instead doing the conda install of gadma. In this instance, the install doesn't complete successfully:

warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
Could not solve for environment specs
The following packages are incompatible
└─ gadma is installable with the potential options
   ├─ gadma [2.0.0rc16|2.0.0rc17|2.0.0rc18] would require
   │  └─ nlopt >=2.7.0,<2.7.1.0a0 , which does not exist (perhaps a missing channel);
   ├─ gadma 2.0.0rc18 would require
   │  └─ python_abi 3.6.* *_cp36m, which does not exist (perhaps a missing channel);
   ├─ gadma 2.0.0rc18 would require
   │  └─ python_abi 3.7.* *_cp37m, which does not exist (perhaps a missing channel);
   └─ gadma [2.0.0|2.0.0rc19|...|2.0.0rc26] would require
      └─ dadi with the potential options
         ├─ dadi 1.7.0 would require
         │  └─ python >=2.7,<2.8.0a0 , which can be installed;
         ├─ dadi [2.0.3|2.0.4|2.0.5] would require
         │  └─ python >=3.7,<3.8.0a0 , which can be installed;
         └─ dadi [2.0.4|2.0.5] would require
            └─ python >=3.6,<3.7.0a0 , which can be installed.

As I mentioned before, I had been using python 3.8 until this point. Now, I instead moved to python 3.7 again, and specified nlopt=2.7.0 in my list of requirements. Sadly, now I find that numpy and python start fighting:

warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
Could not solve for environment specs
The following packages are incompatible
├─ numpy 1.22.4  is installable with the potential options
│  ├─ numpy 1.22.4 would require
│  │  └─ python_abi 3.10.* *_cp310, which can be installed;
│  ├─ numpy 1.22.4 would require
│  │  └─ python_abi 3.8 *_pypy38_pp73, which can be installed;
│  ├─ numpy 1.22.4 would require
│  │  └─ python_abi 3.8.* *_cp38, which can be installed;
│  ├─ numpy 1.22.4 would require
│  │  └─ python_abi 3.9.* *_cp39, which can be installed;
│  └─ numpy 1.22.4 would require
│     └─ python_abi 3.9 *_pypy39_pp73, which can be installed;
└─ python 3.7  is not installable because there are no viable options
   ├─ python 3.7.0 would require
   │  └─ python_abi * *_cp37m, which conflicts with any installable versions previously reported;
   └─ python 3.7.0 conflicts with any installable versions previously reported.

What I would love to try would be to specify the versions of all the dependencies in my conda install, copying exactly what we know works on your system! Might you share these dependency versions?

@noscode
Copy link
Collaborator

noscode commented Mar 28, 2024

Dear Emma,

Wow, thank you for the details! Below, I provide the steps that I used just now to install and run GADMA on my laptop, I hope that will help you.

First, you said that versions of numpy and smac were overwritten after pip installation of GADMA. That usually means that pip does not see these packages to be installed for some reason. If you installed them using mamba, probably there is a solution how to tell pip about mamba installation directory.

Second, I would recommend to use Python 3.8 as it appears to be more reliable version.

Here my steps how I installed GADMA in local environment:

  1. Create empty conda environment with Python 3.8:
conda create -n gadma_env python=3.8
  1. Activate environment
conda activate gadma_env
  1. Install specific versions (specific version of matplotlib is requiered for drawing with moments) and nlopt (for some reason there was an error during gadma installation):
pip install ruamel.yaml==0.16.12
pip install matplotlib==3.5.3
conda install nlopt
  1. Install moments, e.g. using conda (works for Windows and Linux, but not for MacOS)
conda config --add channels bioconda
conda install moments
  1. Clone GADMA repository and install it
git clone https://github.com/ctlab/GADMA.git
cd GADMA
pip install .

I did not try, but probably this should also work:

pip install gadma
  1. After that, everything worked for me
gadma --test

Here is the output of my conda list:

# packages in environment at /Users/noskovae/anaconda3/envs/gadma_env:
#
# Name                    Version                   Build  Channel
attrs                     23.2.0                   pypi_0    pypi
blas                      1.0                    openblas  
bzip2                     1.0.8                h80987f9_5  
ca-certificates           2024.3.11            hca03da5_0  
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.9                    pypi_0    pypi
dadi                      2.3.3                    pypi_0    pypi
demes                     0.2.3                    pypi_0    pypi
fonttools                 4.50.0                   pypi_0    pypi
gadma                     2.0.1.dev7               pypi_0    pypi
importlib-resources       6.4.0                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
libcxx                    16.0.6               h4653b0c_0    conda-forge
libffi                    3.4.4                hca03da5_0  
libgfortran               5.0.0           11_3_0_hca03da5_28  
libgfortran5              11.3.0              h009349e_28  
libopenblas               0.3.21               h269037a_0  
libsqlite                 3.45.2               h091b4b1_0    conda-forge
libzlib                   1.2.13               h53f4e23_5    conda-forge
llvm-openmp               14.0.6               hc6e5704_0  
matplotlib                3.5.3                    pypi_0    pypi
moments                   1.1.15                   pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  h313beb8_0  
nlopt                     2.7.1            py38h6f14d55_4    conda-forge
numpy                     1.24.3           py38h1398885_0  
numpy-base                1.24.3           py38h90707a3_0  
openssl                   3.0.13               h1a28f6b_0  
packaging                 24.0                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       23.3.1           py38hca03da5_0  
pyparsing                 3.1.2                    pypi_0    pypi
python                    3.8.16          h3ba56d0_1_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.8                      4_cp38    conda-forge
pytz                      2024.1                   pypi_0    pypi
readline                  8.2                  h1a28f6b_0  
ruamel-yaml               0.16.12                  pypi_0    pypi
ruamel-yaml-clib          0.2.8                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                68.2.2           py38hca03da5_0  
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h80987f9_0  
tk                        8.6.12               hb8d0fd4_0  
tzdata                    2024.1                   pypi_0    pypi
wheel                     0.41.2           py38hca03da5_0  
xz                        5.4.6                h80987f9_0  
zipp                      3.18.1                   pypi_0    pypi
zlib                      1.2.13               h53f4e23_5    conda-forge

Best regards,
Ekaterina

@steigeec
Copy link
Author

Hi, Ekaterina-

Thank you so much for your kindness and helpfulness in guiding me through this installation procedure!

With the information you provided and the support of an incredible labmate, I was finally able to complete this installation process. In case others are struggling, I want to share this solution here !!

conda env create -n DADI -f gadma_requirements.yaml
... where gadma_requirements.yaml is:

channels:
    - conda-forge
    - bioconda
dependencies:
    - python=3.8
    - pip
    - setuptools_scm<=7.1.0
    - numpy>=1.16.5,<1.23.0
    - scipy>=0.6.0,<1.7.0
    - matplotlib<=3.5.3
    - Pillow>=4.2.1
    - ruamel.yaml==0.16.12
    - mpmath
    - Cython
    - networkx
    - h5py
    - scikit-allel
    - pandas
    - moments
    - dadi=2.3.2
    - demes
    - demesdraw
    - gadma
    - scikit-optimize
    - configspace
    - pip:
      - smac==0.13.1

This worked perfectly for him. I still have some issue with pip, so needed to run separately
pip install smac==0.13.1

Thanks, again, so much, Ekaterina! I am thrilled to have GADMA with BO running now!!

@noscode
Copy link
Collaborator

noscode commented Apr 2, 2024

Hi @steigeec,

I am glad you have successfully overcame installation issue! Please let me know if you have any further questions.

Ekaterina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants