Skip to content
Ruolin He edited this page Mar 10, 2024 · 5 revisions

What is NRPS-motif-Finder

NRPS-motif-Finder is a tool for standardization of Non-ribosomal peptide synthetase (NRPS). It partitions the input NRPS protein sequence by locating these conserved motif, to output a motif-and-intermotif architecture that feeds in subsequent analysis such as C domain classification, NRPS re-engineering.

logo

Supported domains and motifs

Adenylation (A) domain has 12 domain: Aalpha, A1-A5, G-motif, A6-A10. Among them, Aalpha and G-motif were two new motifs proposed in our paper.

Condensation (C) domain has 10 domain: C1-C10.

Thiolation (T) domain has 2 domain: Talpha, T1. Talpha was one new motif our paper.

Thioesterase (TE) domain has 1 domain: TE1.

Epimerization (E) domain has 7 domains: E1-E7.

C domain subtype

One of the most important features is that NRPS-motif-Finder supports the full subtype classification of C domain.

C_all_tree7

Maximum-likelihood phylogenetic tree of the condensation domain superfamily.

Subtype classification and sequences are described in the main text and the Method. Different subtypes are indicated by colors, with subtypes exclusive to fungi marked by underlines, and subtypes found predominantly in bacteria marked by asterisks. This tree is rooted, taking papA and WES as outgroups(black shading). L-clade and D-clade are indicated by blue and red shading, respectively.

The details of C domain subtypes

C domain subtypes Sequence source Link Species distribution Function Comment
LCL Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380461&#seqhrch Bacteria and Fungi both LCL-type C domains catalyze peptide bond formation between two L-amino acids. It's hard to distingush between LCL and SgcC5 due to the high sequence similarity.
DCL Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380465&#seqhrch Bacteria and Fungi both The DCL-type C domain catalyzes the condensation between a D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor.
Starter Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380456&#seqhrch Bacteria dominate While standard C domains catalyze peptide bond formation between two amino acids, the (Starter) C-domain may instead acylate an amino acid with a fatty acid in the first module of NRPS.
Dual Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380466&#seqhrch Bacteria and Fungi both Dual function E/C domains have both an epimerization and a DCL condensation activity. Dual E/C domains first epimerize the substrate amino acid to produce a D-configuration, then catalyze the condensation between the D-aminoacyl/peptidyl-PCP donor and a L-aminoacyl-PCP acceptor.
CT Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380464&#seqhrch Fungi only Unlike bacterial NRPS, which typically have specialized terminal thioesterase (TE) domains to cyclize peptide products, many fungal NRPSs employ a terminal condensation-like (CT) domain to produce macrocyclic peptidyl products.
CT-DCL Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380464&#seqhrch Fungi only CT-DCL domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. This subtype is proposed in our paper.
CT-A Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380464&#seqhrch Fungi only CT-Atypical (CT-A) domain catalyzes the same reaction with DCL domain but has high sequence similarity with CT domain. And it is always behind an ACP (acyl carrier protein) domain rather than a T domain. This subtype is proposed in our paper.
PS Literature https://www.nature.com/articles/nchembio.365 Bacteria dominate PS domain catalyzes Pictet-Spengler reaction.
bL Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380469&#seqhrch Bacteria dominate Beta-lactam (bL) C domain mediates an unusual cyclization to form beta-lactam rings. bL domain actually is a subtype of DCL domain.
X Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380468&#seqhrch Bacteria dominate X domain is a catalytically inactive Condensation-like domain shown to recruit oxygenases to the NRPS.
Cyc Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380458&#seqhrch Bacteria and Fungi both Cyc (heterocyclization) domains catalyze two separate reactions in the creation of heterocyclized peptide products in NRPS: amide bond formation followed by intramolecular cyclodehydration between a Cys, Ser, or Thr side chain and a carbonyl carbon on the peptide backbone to form a thiazoline, oxazoline, or methyloxazoline ring.
I Literature https://www.pnas.org/doi/10.1073/pnas.1903161116 Bacteria dominate Interface (I) domain plays a role in positioning the β-hydroxylase and the NRPS-bound amino acid substrate prior to hydroxylation.
modAA Literature https://pubs.acs.org/doi/10.1021/jacs.1c13404 Bacteria dominate The core function of modAA C domain is to catalyze the dehydration of beta-hydroxy amino acid (such as Ser, Thr) and form a dehydroamino acid. The derived functions include pyrrolizidine formation, conjugate addition instead of amideformation, pyrimidine formation, l-2-amino-4-methoxy-trans-3-butenoic acid formation, Side chain conjugate addition.
Cglyc MiBiG 2v https://academic.oup.com/nar/article/48/D1/D454/5587631 Bacteria dominate Glycopeptide condensation domain functions in peptide bond formation during glycopeptide antibiotic biosynthesis.
Hybrid Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380455&#seqhrch Bacteria and Fungi both C domain of hybrid polyketide synthetase/nonribosomal peptide synthetases (PKS/NRPSs) catalyze peptide bond formation within (usually) large multi-modular enzymatic complexes. Hybrid PKS/NRPS create polymers containing both polyketide and amide linkages.
FUM14 Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380467&#seqhrch Fungi only C domain of NRPS similar to the ester-bond forming Fusarium verticillioides FUM14 protein. The module with FUM14 domain is always used iteratively. And ester-bond formation function is uncommon.
SgcC5 Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380462&#seqhrch Bacteria and Fungi both SgcC5 is a NRPS C domain with ester- and amide- bond forming activity. It's hard to distingush between LCL and SgcC5 due to the high sequence similarity.
LCL-A Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380460&#seqhrch Fungi only C domain with an atypical active site motif. Members of this subfamily typically have a non-canonical conserved SHXXXDX(14)Y motif which replaces HHXXXD motif typically found in the C domain. This subtype is named in our paper.
E Conserved Protein Domain Family https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?hslf=1&uid=380457&#seqhrch Bacteria and Fungi both Epimerization (E) domains of NRPS flip the chirality of the end amino acid of a peptide being manufactured by the NRPS.

Note: In the NRPS-motif-Finder result, E domain is not considered to be a kind of C domain subtype. And E domain has 7 motifs while C domain has 10 motifs.

Source code of NRPS-motif-Finder

There are two version of NRPS-motif-Finder applied in Matlab and Python.

We recommend Matlab version because it will be update frequently for solving bug. And Python version is stable version used in our online platform.

NRPS-motif-Finder-matlab-version

Matlab code of NRPS motif Finder.

NRPS-motif-Finder-Python-version

Python code of NRPS motif Finder.

Clone this wiki locally