-
Notifications
You must be signed in to change notification settings - Fork 37
Work done by each step
Greg Landrum edited this page Dec 4, 2019
·
14 revisions
If the molecule causes the exclusion flag to be set, this function does nothing. See the page on the exclusion flag for more details.
- Standardize unknown stereochemistry (Handled by the RDKit Mol file parser)
- Fix wiggly bonds on sp3 carbons - sets atoms and bonds marked as unknown stereo to no stereo
- Fix wiggly bonds on double bonds – set double bond to crossed bond
- Clears S Group data from the mol file
- Kekulize the structure
- Remove H atoms (See the page on explicit Hs for more details)
- Normalization:
- Fix hypervalent nitro groups
- Fix KO to K+ O- and NaO to Na+ O- (Also add Li+ to this)
- Correct amides with N=COH
- Standardise sulphoxides to charge separated form
- Standardize diazonium N (atom
:2
here:[*:1]-[N;X2:2]#[N;X1:3]>>[*:1]
) to N+ - Ensure quaternary N is charged
- Ensure trivalent O (
[*:1]=[O;X2;v3;+0:2]-[#6:3]
) is charged - Ensure trivalent S (
[O:1]=[S;D2;+0:2]-[#6:3]
) is charged - Ensure halogen with no neighbors (
[F,Cl,Br,I;X0;+0:1]
) is charged
- The molecule is neutralized, if possible. See the page on neutralization rules for more details.
- Remove stereo from tartrate to simplify salt matching
- Normalise (straighten) triple bonds and allenes
- Sets all isotopes to 0 and removes Hs. This extra H removal step is necessary because previous H removal will have skipped D or T atoms (see the page on explicit Hs for more details)
- Solvents (defined in a list) are removed unless this step removes all fragments, in which case the molecule is not modified.
- Salts (defined in a list) are removed unless this step removes all fragments, in which the molecule is not modified.
- Duplicate fragments (duplication detected after neutralizing and removing Hs from fragments) are removed. Duplicates are detected using canonical SMILES, so fragments that are tautomers of each other will not be removed in this step.
- The remaining molecule is neutralized, if possible. See the page on neutralization rules for more details.
- The exclusion flag is checked (see the page on the exclusion flag for more details). If the flag is set, the molecule remaining after solvent stripping is returned. Two specific examples illustrating where this makes a difference are ranitidine bismuth citrate (CHEMBL2111286) - the Bi ion is stripped as a salt and ranitidine is the parent - and CuCl2.2H2O - the waters (solvent) and chloride ions (salts) are removed, but the exclusion flag is set by the Cu+2 that remains so the parent is CuCl2.
- Number of atoms <1 i.e. empty CTAB
- Polymer
- V3000 mol file
- 3D coordinates in mol file
- 3D flag set on 2D molecule
- Illegal bond type
- Illegal bond stereo
- Multiple stereobonds on stereoatom
- Overlapping atoms (atoms with identical coordinates)
- Zero coordinates (all atoms have zero coordinates) - can happen when mol file created from smiles
- Stereobond in ring
- Stereobond between stereo centres
- Crossed bonds in ring
- Radicals that don’t fit known stable radical patterns (allowed are 'Nitric Oxide, Aminoxyl’)
- StereoCenters MOL/InChI/RDKit mismatch
- StereoCenters MOL_RDKit/InChI mismatch
- StereoCenters MOL_InChI/RDKit mismatch
- StereoCenters InChI_RDKit/MOL mismatch
- InChI warning:Accepted unusual valence(s)
- InChI warning:Empty structure
- InChI ambiguous stereo
- Any other InChI error/warning
- Illegal input (mol block could not be parsed)
- Mol: number of atoms where a wedged bond starts
- InChI: number of tetrahedral stereocenters
- RDKit: number of atomic stereocenters remaining after calling
Chem.AssignStereochemistry()