Skip to content

ChainofChaos/MinimalSetofViralPeptidome-UNIQmin

 
 

Repository files navigation

UNIQmin: An alignment-independent tool for the study of pathogen sequence diversity at any given rank of taxonomy lineage

DOI - 10.3390/biology10090853 PyPI GitHub tag License

Brief Description

Sequence variation among pathogens, even of a single amino acid, can expand their host repertoire or enhance the infection ability. Alignment independent approach represents an alternative approach to the study of pathogen diversity, which is devoid of the need for sequence conservation to perform comparative analyses. Herein, we present UNIQmin, a tool that utilises an alignment independent method to generate the minimal set of pathogen sequences, as a way to study their diversity, across any rank of taxonomic lineage. The minimal set refers to the smallest possible number of sequences required to capture the entire repertoire of pathogen peptidome diversity present in a sequence dataset.


Table of Contents

Step-by-step of UNIQmin

Please refer to the PythonScript folder.

Figure Scheme

uniqminScheme

UNIQmin as a Pipeline

Shell Version

As visualised above, UNIQmin comprises of five steps with respective python scripts employed according to the order of step (server specs: Intel(R) Xeon(R) E5-2690 v2 @ 3.00GHz 40-core processors, 396 GB of RAM and 44 TB of local storage. The single pipeline shell script (UNIQmin.sh), sample input file (exampleinput.fas) and example output (exampleoutput.fasta) are provided.

uniqmin.sh

Python Version

python uniqmin.py -i exampleinput.fas -o example -k 9 -cpu 14

UNIQmin as a Package

Installation

  • via pip

    pip install uniqmin
    
  • via package clone from GitHub repository

    git clone https://github.com/ChongLC/MinimalSetofViralPeptidome-UNIQmin.git
    

    Note for user who uses conda environment (e.g.: jupyter notebook):
    Before pip installing the package, run

    conda config --add channels conda-forge
    conda install pyahocorasick
    

    ... and restart the kernel to use the updated package. Then, run

    pip install uniqmin
    

Upgrade installed version

pip install uniqmin --upgrade

Usage

uniqmin [-h] [-i INPUT] [-o OUTPUT] [-k KMERLENGTH] [-cpu CPUSIZE]

For example, UNIQmin tool is applied to generate a minimal set (in example folder) with a sample input file (exampleinput.fas). A k-mer window size of nine (9; nonamer) is used with utilising 14-cores.

uniqmin -i exampleinput.fas -o example -k 9 -cpu 14

Command-line Arguments

Argument Parameter Type Required Default Description
-h help N/A FALSE N/A Show this help message and exit
-i sequence input file String TRUE N/A Path of the input file (in FASTA format)
-o output directory name String TRUE N/A Path of the output file to be created
-k k-mer window size Integer FALSE 9 The length of k-mers to be used
-cpu cpu size Integer FALSE 14 The number of CPU cores to be used

Generate a random protein sequence dataset

This section is particular for the Protocol paper. For the details of this section and the python script, please refer to the randomizer folder.


Citing Resources


Found a bug?

Or would like a feature added? Or maybe drop some feedback? Just open a new issue or send an email to us (lichuinchong@gmail.com).

About

An alignment-independent tool for the study of pathogen sequence diversity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.1%
  • Jupyter Notebook 21.2%
  • Shell 1.7%