Links:
If you find PopPUNK useful, please cite us:
Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research 29:304-316 (2019). doi:10.1101/gr.241455.118
You can also run your command with --citation
to get a list of citations and a suggested methods paragraph.
The roadmap can be found in the documentation.
PopPUNK 2.7.0 comes with two changes:
- Distance matrices
<db_name>.dists.npy
are no longer required or written when usingpoppunk_assign
, with or without--update-db
. These can be very large, especially with many samples, so this saves space and memory in model reuse and distribution. Note that the<db_name>.dists.pkl
file is still required (but this is small). - We have added a
--stable
flag topoppunk_assign
. Rather than merging hybrid clusters, new samples will simply be assigned to their nearest neighbour. This implies--serial
and cannot be run with--update-db
. This behaviour mimics the 'stable nomenclature' of schemes such as LIN.
We have retired the PopPUNK website. Databases have been expanded, and can be found here: https://www.bacpop.org/poppunk/.
The change in scikit-learn's API in v1.0.0 and above mean that HDBSCAN models
fitted with sklearn <=v0.24
will give an error when loaded. If you run into this,
the solution is one of:
- Downgrade sklearn to v0.24.
- Run model refinement to turn your model into a boundary model instead (this will change clusters).
- Refit your model in an environment with
sklearn >=v1.0
.
If this is a common problem let us know, as we could write a script to 'upgrade' HDBSCAN models. See issue #213 for more details.
We have fixed a number of bugs with may affect the use of poppunk_assign
with
--update-db
. We have also fixed a number of bugs with GPU distances. These are
'advanced' features and are not likely to be encountered in most cases, but if you do wish to use either of these features please make sure that you are using
PopPUNK >=v2.4.0
with pp-sketchlib >=v1.7.0
.
We have discovered a bug affecting the interaction of pp-sketchlib and PopPUNK.
If you have used PopPUNK >=v2.0.0
with pp-sketchlib <v1.5.1
label order may
be incorrect (see issue #95).
Please upgrade to PopPUNK >=v2.2
and pp-sketchlib >=v1.5.1
. If this is not
possible, you can either:
- Run
scripts/poppunk_pickle_fix.py
on your.dists.pkl
file and re-run model fits. - Create the database with
poppunk_sketch
directly, rather thanPopPUNK --create-db
This is for the command line version. For more details see installation in the documentation.
Our (beta) web interface BeeBOP is now also available: https://beebop.dide.ic.ac.uk/
The easiest way is through conda, which is most easily accessed by first installing miniconda. PopPUNK can then be installed by running:
conda install poppunk
If the package cannot be found you will need to add the necessary channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
See the overview first. There are two ways of running:
- Download an existing database.
- Run assignment.
A docker image is available
docker pull mrcide/poppunk:bacpop-20