You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace MDS solver; match variable naming to paper (#4)
Pandora now uses scikit-allel MDS instead of scikit-learn MDS and the confidence_level variable/CLI flag was renamed to convergence tolerance to match the terminology of the paper
Copy file name to clipboardExpand all lines: docs/cli_config.rst
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Configuration options:
12
12
- ``file_format``, default = ``EIGENSTRAT``, Name of the file format your dataset is in. Supported formats are ``ANCESTRYMAP``, ``EIGENSTRAT``, ``PED``, ``PACKEDPED``, ``PACKEDANCESTRYMAP``. For more information see Section `Input data`_ below.
13
13
- ``convertf``, default = ``convertf``, File path pointing to an executable of Eigensoft's ``convertf`` tool. ``convertf`` is used if the provided dataset is not in ``EIGENSTRAT`` format. Default is ``convertf``. This will only work if ``convertf`` is installed systemwide.
14
14
- ``bootstrap_convergence_check``, default = ``True``, If true, instead of computing ``n_replicates`` bootstraps and embeddings, Pandora will check for convergence once every ``max(10, threads)`` bootstrap embeddings are computed. If according to our heuristic (see TODO for more details) the bootstrap procedure converged, all remaining tasks are cancelled and the stability is determined uisng only the number of replicates computed when convergence is determined. Due to the runtime overhead of the convergence check compared to the runtime of MDS computations, we only advice using this convergence check for PCA analyses. Note that this parameter is only relevant if ``analysis_mode`` is ``AnalysisMode.BOOTSTRAP``.
15
-
- ``bootstrap_convergence_confidence_level``, default=0.05, Determines the level of confidence when checking for bootstrap convergence. A value of :math:`X` means that we allow deviations of up to :math:`X * 100\%` between pairwise bootstrap comparisons and still assume convergence.
15
+
- ``bootstrap_convergence_tolerance``, default=0.05, Determines the level of deviation tolerance when checking for bootstrap convergence. A value of :math:`X` means that we allow deviations of up to :math:`X * 100\%` between pairwise bootstrap comparisons and still assume convergence.
16
16
- ``n_replicates``, default = 100, Number of bootstrap replicates or sliding windows to compute
17
17
- ``keep_replicates``, default = ``false``, Whether to store all intermediate datasets files (``.geno``, ``.snp``, ``.ind``). Note that this will result in a substantial storage consumption. Note that in case of bootstrapping, the bootstrapped indices are stored as checkpoints for full reproducibility in any case.
18
18
- ``n_components``, default = 10, Number of components to compute and compare for PCA or MDS analyses. We recommend 10 for PCA analyses and 2 for MDS analyses. The default is 10 since the default for ``embedding_algorithm`` is ``PCA``.
@@ -63,7 +63,7 @@ You should then see an output similar to this:::
63
63
n_replicates: 10
64
64
keep_replicates: False
65
65
bootstrap_convergence_check: True
66
-
bootstrap_convergence_confidence_level: 0.05
66
+
bootstrap_convergence_tolerance: 0.05
67
67
n_components: 10
68
68
embedding_algorithm: PCA
69
69
smartpca: smartpca
@@ -85,7 +85,7 @@ You should then see an output similar to this:::
85
85
[00:00:02] Running SmartPCA on the input dataset.
86
86
[00:00:02] Plotting embedding results for the input dataset.
87
87
[00:00:18] Drawing 10 bootstrapped datasets and running PCA.
88
-
[00:00:18] NOTE: Bootstrap convergence check is enabled. Will terminate bootstrap computation once convergence is determined. Convergence confidence level: 0.05
88
+
[00:00:18] NOTE: Bootstrap convergence check is enabled. Will terminate bootstrap computation once convergence is determined. Convergence tolerance: 0.05
89
89
[00:00:27] Bootstrapping done. Number of replicates computed: 10
Copy file name to clipboardExpand all lines: docs/usage.rst
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -40,11 +40,11 @@ The Command line interface, as well as when using the Eigen-based Pandora Python
40
40
Smartpca is a powerful PCA tool that implements a lot of genotype-data specific routines and optimizations and provides a lot of useful options for meaningful PCA analyses such as outlier detection.
41
41
Pandora supports all custom configuration settings of smartpca. See the Section :ref:`SmartPCA` for more information. For MDS analyses, Pandora will use
42
42
smartpca to generate the Fst-distance matrix as input for MDS. Note that this distance matrix computes the distances between population and not between samples.
43
-
The subsequent MDS analysis is performed using the scikit-learn MDS implementation.
43
+
The subsequent MDS analysis is performed using the scikit-allel MDS implementation.
44
44
If you have genotype data in Eigenfiles but want to be able to do a more flexible analysis, consider using the alternative NumPy interface. Pandora provides a method
45
45
to load your genotype data in EIGENSTRAT format as numpy array.
46
46
47
-
If you are using the NumPy-based Pandora interface, PCA and MDS is performed using the scikit-learn implementations. For both analyses, Pandora supports different types of data imputation, see the API documentation for more information.
47
+
If you are using the NumPy-based Pandora interface, PCA and MDS is performed using the scikit-learn and scikit-allel implementations respectively. For both analyses, Pandora supports different types of data imputation, see the API documentation for more information.
48
48
Per default, Pandora will apply SNP-wise mean imputation. The default distance metric for MDS analysis is the pairwise euclidean distance between all samples in your data. However, Pandora provides alternative distance metrics
49
49
and allows you to define your own distance metric as well. Again, see the API documentation for further information.
0 commit comments