Skip to content

Commit 29935f7

Browse files
Merge pull request #184 from timothymillar/atomize
Beta v0.10.0
2 parents d05d748 + 4abb69c commit 29935f7

File tree

108 files changed

+8423
-960
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+8423
-960
lines changed

.github/workflows/python-package.yml

+4-5
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ name: Python package
55

66
on:
77
push:
8-
branches: [ master ]
8+
branches: [ master, "call-pedigree"]
99
pull_request:
10-
branches: [ master ]
10+
branches: [ master, "call-pedigree"]
1111

1212
jobs:
1313
build:
1414

1515
runs-on: ubuntu-latest
1616
strategy:
1717
matrix:
18-
python-version: ["3.8", "3.9", "3.10", "3.11"]
18+
python-version: ["3.10", "3.11"]
1919

2020
steps:
2121
- uses: actions/checkout@v2
@@ -32,8 +32,7 @@ jobs:
3232
uses: pre-commit/action@v2.0.0
3333
- name: Build and install mchap
3434
run: |
35-
python setup.py sdist
36-
pip install dist/mchap-*.tar.gz
35+
pip install .
3736
- name: Test with pytest (bounds checked)
3837
env:
3938
NUMBA_BOUNDSCHECK: 1

CHANGELOG.md

+18
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,24 @@
33
## Unreleased
44

55

6+
## Beta v0.10.0
7+
8+
New Features:
9+
- New experimental `atomize` tool for splitting haplotypes into basis SNVs #72.
10+
- New experimental `call-pedigree` tool fo pedigree informed genotype calling.
11+
- Optionally specify just the `INFO` or `FORMAT` variant of a optional VCF field.
12+
- Use `setuptools_scm` for versioning #179.
13+
14+
VCF Changes:
15+
- Renamed `PHQ` and `PHPM` to `SQ` and `SPM` for clarity.
16+
- Added `INFO/UAN` field for number of unique alleles called #174.
17+
- Added `INFO/MCI` field for proportion of sample with Markov Chain incongruence.
18+
- Added optional fields #174:
19+
* `INFO/AOPSUM` (sum of `FORMAT/AOP`).
20+
* `INFO/ACP` and `FORMAT/ACP`.
21+
* `INFO/SNVDP` and `FORMAT/SNVDP`.
22+
23+
624
## Beta v0.9.3
725

826
Bug Fixes:

README.rst

+13-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,17 @@ frequencies (estimated from the mean of individual frequencies), but no genotype
6666
Example notebook
6767
----------------
6868

69-
An `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.
69+
See the `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.
70+
71+
Experimental features
72+
---------------------
73+
74+
\:warning: **WARNING: The following tools are highly experimental!!!** :warning:
75+
76+
- ``mchap call-pedigree``: for pedigree informed genotype calling.
77+
- ``mchap atomize``: for converting micro-haplotype calls to phased sets of SNVs.
78+
79+
See the `experimental notebook`_ demonstrating the `call-pedigree` tool as presented at the 2024 `Tools for Polyploids`_ workshop.
7080

7181
Funding
7282
-------
@@ -80,3 +90,5 @@ The development of MCHap was partially funded by the "Tools for Polyploids" Spec
8090
.. _`MCHap assemble documentation`: docs/assemble.rst
8191
.. _`MCHap call documentation`: docs/call.rst
8292
.. _`example notebook`: docs/example/bi-parental.ipynb
93+
.. _`experimental notebook`: docs/example/bi-parental-pedigree.ipynb
94+
.. _`Tools for Polyploids`: https://www.polyploids.org/

cli-assemble-help.txt

+17-8
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,21 @@ options:
115115
The chosen field determines tha sample ids required in
116116
other input files e.g. the --sample-list argument.
117117
--report [REPORT ...]
118-
Extra fields to report within the output VCF: AFPRIOR
119-
= prior allele frequencies; AFP = posterior mean
120-
allele frequencies; AOP = posterior probability of
121-
allele occurring at any copy number; GP = genotype
122-
posterior probabilities; GL = genotype likelihoods.
118+
Extra fields to report within the output VCF. The
119+
INFO/FORMAT prefix may be omitted to return both
120+
variations of the named field. Options include:
121+
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
122+
Posterior allele counts; INFO/AFP = Posterior mean
123+
allele frequencies; INFO/AOP = Posterior probability
124+
of allele occurring across all samples; INFO/AOPSUM =
125+
Posterior estimate of the number of samples containing
126+
an allele; INFO/SNVDP = Read depth at each SNV
127+
position; FORMAT/ACP: Posterior allele counts;
128+
FORMAT/AFP: Posterior mean allele frequencies;
129+
FORMAT/AOP: Posterior probability of allele occurring;
130+
FORMAT/GP: Genotype posterior probabilities;
131+
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
132+
depth at each SNV position
123133
--cores CORES Number of cpu cores to use (default = 1).
124134
--mcmc-chains MCMC_CHAINS
125135
Number of independent MCMC chains per assembly
@@ -133,9 +143,8 @@ options:
133143
--mcmc-seed MCMC_SEED
134144
Random seed for MCMC (default = 42).
135145
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
136-
Posterior phenotype probability threshold for
137-
identification of incongruent posterior modes (default
138-
= 0.60).
146+
Posterior probability threshold for identification of
147+
incongruent posterior modes (default = 0.60).
139148
--mcmc-fix-homozygous MCMC_FIX_HOMOZYGOUS
140149
Fix alleles that are homozygous with a probability
141150
greater than or equal to the specified value (default

cli-atomize-help.txt

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
usage: Split MCHap haplotype calls into phased blocks of basis SNVs.
2+
[-h] haplotypes
3+
4+
positional arguments:
5+
haplotypes VCF file containing haplotype variants to be atomized. This file
6+
must contain INFO/SNVPOS. The INFO/DP and FORMAT/DP fields will
7+
be calculated from FORMAT/SNVDP if present in the input VCF
8+
file. The INFO/ACP and FORMAT/DS fields will be calculated from
9+
FORMAT/ACP or FORMAT/AFP if either is present in the input VCF
10+
file. Note that the FORMAT/ACP or FORMAT/AFP fields from the
11+
input VCF file will be normalized in the event that they do not
12+
sum to ploidy or one respectively.
13+
14+
options:
15+
-h, --help show this help message and exit

cli-call-exact-help.txt

+15-5
Original file line numberDiff line numberDiff line change
@@ -102,9 +102,19 @@ options:
102102
The chosen field determines tha sample ids required in
103103
other input files e.g. the --sample-list argument.
104104
--report [REPORT ...]
105-
Extra fields to report within the output VCF: AFPRIOR
106-
= prior allele frequencies; AFP = posterior mean
107-
allele frequencies; AOP = posterior probability of
108-
allele occurring at any copy number; GP = genotype
109-
posterior probabilities; GL = genotype likelihoods.
105+
Extra fields to report within the output VCF. The
106+
INFO/FORMAT prefix may be omitted to return both
107+
variations of the named field. Options include:
108+
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
109+
Posterior allele counts; INFO/AFP = Posterior mean
110+
allele frequencies; INFO/AOP = Posterior probability
111+
of allele occurring across all samples; INFO/AOPSUM =
112+
Posterior estimate of the number of samples containing
113+
an allele; INFO/SNVDP = Read depth at each SNV
114+
position; FORMAT/ACP: Posterior allele counts;
115+
FORMAT/AFP: Posterior mean allele frequencies;
116+
FORMAT/AOP: Posterior probability of allele occurring;
117+
FORMAT/GP: Genotype posterior probabilities;
118+
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
119+
depth at each SNV position
110120
--cores CORES Number of cpu cores to use (default = 1).

cli-call-help.txt

+17-8
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,21 @@ options:
106106
The chosen field determines tha sample ids required in
107107
other input files e.g. the --sample-list argument.
108108
--report [REPORT ...]
109-
Extra fields to report within the output VCF: AFPRIOR
110-
= prior allele frequencies; AFP = posterior mean
111-
allele frequencies; AOP = posterior probability of
112-
allele occurring at any copy number; GP = genotype
113-
posterior probabilities; GL = genotype likelihoods.
109+
Extra fields to report within the output VCF. The
110+
INFO/FORMAT prefix may be omitted to return both
111+
variations of the named field. Options include:
112+
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
113+
Posterior allele counts; INFO/AFP = Posterior mean
114+
allele frequencies; INFO/AOP = Posterior probability
115+
of allele occurring across all samples; INFO/AOPSUM =
116+
Posterior estimate of the number of samples containing
117+
an allele; INFO/SNVDP = Read depth at each SNV
118+
position; FORMAT/ACP: Posterior allele counts;
119+
FORMAT/AFP: Posterior mean allele frequencies;
120+
FORMAT/AOP: Posterior probability of allele occurring;
121+
FORMAT/GP: Genotype posterior probabilities;
122+
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
123+
depth at each SNV position
114124
--cores CORES Number of cpu cores to use (default = 1).
115125
--mcmc-chains MCMC_CHAINS
116126
Number of independent MCMC chains per assembly
@@ -124,6 +134,5 @@ options:
124134
--mcmc-seed MCMC_SEED
125135
Random seed for MCMC (default = 42).
126136
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
127-
Posterior phenotype probability threshold for
128-
identification of incongruent posterior modes (default
129-
= 0.60).
137+
Posterior probability threshold for identification of
138+
incongruent posterior modes (default = 0.60).

0 commit comments

Comments
 (0)