Sniff

Heuristic tool for pairing up reverse complement reads from fasta/fastq files.

Methods

Sniff loads sequences in batches displaying progress along the way. Once all sequences have been loaded into memory they are again processed in batches. Each batch is used for constructing a target index from reverse complemented reads. Original reads are then mapped against the constructed index using a seed and chain approach with minor modifications. For each read we remember the strongest matching reverse complement read and output reverse complement pairs as overlaps. Here we define an overlap as a tuple query_name, query_start, query_end, target_name, target_start, target_end. Later those overlaps are processed with a pre-trained machine learning model outputting the final result in a csv/tsv format for later use.

Results

Evaluating sniff on an ONT yeast data set

precision	support
0.809323	42024

Here we consider a pair to be a true positive if reads map with different strands to an approximately the same area on the reference. We say that the approximate mapping is sufficient if the intersection of covered area by both reads divided by the union of the same is larger than 0.9.

Build

git clone git@github.com:tbrekalo/sniff.git
cd sniff
make release

Usage

From sniff root directory:

source ./venv/bin/activate
./build/bin/sniff -t 32 path_to_reads.fasta > /tmp/sniff.csv
python ./scripts/inference/lgbm_filter.py -m resources/sniff-lgbm-model.pkl -o /tmp/sniff.csv > pairs.csv

Dependencies

C++

linux kernel 2.6.32 or higher
gcc 11 or higher
clang 11 or higher
intel tbb 2020.3
- should be compatible with later oneapi versions
conan2 with configured profile
cmake 3.21 or higher
git 2.25.1 or higher
- earlier version should do just fine
- git is required for cmake to fetch par of internal dependencies

Test (optional) dependencies

Catch2
- fetched via cmake if missing

Python

conan2==0.0.4
joblib==1.3.2
lightgbm==4.1.0
polars==0.18.5
psutil==5.9.5
pydantic==1.10.9
scikit-learn==1.3.1

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
include/sniff		include/sniff
misc		misc
resources		resources
scripts		scripts
src		src
test		test
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.pylintrc		.pylintrc
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conanfile.txt		conanfile.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sniff

Methods

Results

Evaluating sniff on an ONT yeast data set

Build

Usage

Dependencies

C++

Test (optional) dependencies

Python

About

Releases

Packages

Languages

License

tbrekalo/sniff

Folders and files

Latest commit

History

Repository files navigation

Sniff

Methods

Results

Evaluating sniff on an ONT yeast data set

Build

Usage

Dependencies

C++

Test (optional) dependencies

Python

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages