Basic cheminformatics in Bioclipse is mainly handled by the Chemistry
Development Kit (CDK, [Q27061829,Q27065423,Q30149558])
and for this there is the cdk
manager.
The cdk manager is one with many features. One is to validate CAS registry numbers, identifiers used by the Chemical Abstract Services:
cdk.isValidCAS("50-00-0")
But let's go to the more interesting functionality around chemical graphs. For example, let's see how we can create molecular structures from a SMILES string:
FromSMILES
Normally, structure diagrams are generated without explicit hydrogens. But we can easily add them:
cdk.addExplicitHydrogens(mol)
We can then calculate a number of properties, including the molecular mass\index{molecular mass}, total formal charge, and molecular formula:
cdk.calculateMass(mol)
cdk.totalFormalCharge(mol)
cdk.molecularFormula(mol)
Additionally, we can also inspect some of in the information present in the model:
cdk.has2d(mol)
cdk.has3d(mol)
cdk.isConnected(mol)
The cdk manager is also central to file support. Before we load it, we may want to just check the file format:
cdk.determineFormat(
"/ACS Drug Disclosures/AZD5423.cml"
)
However, this information is not needed when loading files:
mol = cdk.loadMolecule(
"/ACS Drug Disclosures/AZD5423.cml"
)
Saving is quite similar, and there are two methods for the two main formats:
cdk.saveCML(mol, "/Test/mol.cml")
cdk.saveMDLMolfile(mol, "/Test/mol.mol")
The cdx
manager is also based on the CDK and exposes
functionality more oriented at CDK developers. For example, we can
create a String representation of the full data model for debugging
purposes:
cdx.debug(mol)
Or we can see the details of the differences between two data models:
cdx.diff(
cdk.fromSMILES("CC"),
cdk.fromSMILES("CCC")
)
And we can list the exact atom types for the atoms in a molecule:
PerceiveCDKAtomTypes
Which lists for ethanol:
PerceiveCDKAtomTypes
The inchi
manager makes functionality from the InChI
standard available [Q21030547,Q21092920].
The InChI library is not available as a Java library, but is included as a
binary for a selection of platforms and operating systems. This means that we
cannot assume the InChI functionality is always available in Bioclipse.
Furthermore, we need to load the library:
LoadInChI
But when that has succeeded, we can start minting InChIs:
InChIGenerate
Which returns:
InChIGenerate
The returned value is a class called InChI and we can get both the full InChI as well as the InChIKey from it:
InChIKeyGenerate
The opsin
manager makes functionality from the OPSIN
available [Q26481104]: convert IUPAC names to chemical
structures.
ParseIUPACName