Skip to content

lucinamay/biosynfoni

Repository files navigation

스크린샷 2023-10-19 오후 7 59 27
🌿 a biosynformatic molecular fingerprint tailored to natural product chem- and bioinformatic research 🌿

FAIR checklist badge

________________________________________________________________________________________

bi·o·syn·for·ma·tic
/ˌbaɪ oʊ sɪn fərˈ mæt ɪk/
adjective Computers, Biochemistry

relating to biosynthetic information and biochemical logic.
as a concatenation of biosynthetic and bioinformatics, it was coined
during the creation of BioSynFoni.

_________________________________________________________________________________________

Getting started 🌿

Predict biosynthetic class

We have trained a biosynthetic class predictor on biosynfoni fingerprints.

You can try out the predictor on your own molecules here!

Installation

Biosynfoni requires Python 3.9 or later. RDKit is installed as a dependency when installing Biosynfoni.

To install the package, you can use pip:

pip install biosynfoni

Now you can import the biosynfoni package in your Python code or use the command line tool.

Usage in Python

Convert a SMILES string to a fingerprint:

from biosynfoni import Biosynfoni
from rdkit import Chem

smi = <SMILES>
mol = Chem.MolFromSmiles(smi)
fp = Biosynfoni(mol).fingerprint  # returns biosynfoni's count fingerprint of the molecule

Usage in the command line

Create a fingerprint from a SMILES string:

biosynfoni <SMILES>

Create a fingerprint from an InChI string:

biosynfoni <InChI>

Write the fingerprints of all molecules in an SDF file to a CSV file:

biosynfoni <molecule_supplier.sdf>

Preprint

Citation

If you use biosynfoni in your research, please cite our preprint:

@article{nollen2025biosynfoni,
  title={Biosynfoni: A Biosynthesis-informed and Interpretable Lightweight Molecular Fingerprint},
  author={Nollen, Lucina-May, Meijer, David, Sorokina, Maria, and Van der Hooft, Justin J. J.},
  journal={chemRxiv},
  year={2025}
}

Data availability

We created several biosynthetic class predictors for our manuscript, which can be downloaded from Zenodo here.

We have used data from the COCONUT natural product database (DOI) and ZINC compound database (DOI). The parsed data used for the analysis in our manuscript can be downloaded from Zenodo here.