Skip to content

Latest commit

 

History

History
138 lines (96 loc) · 3.02 KB

chem.i.md

File metadata and controls

138 lines (96 loc) · 3.02 KB

Cheminformatics

The cdk manager

Basic cheminformatics in Bioclipse is mainly handled by the Chemistry Development Kit (CDK, [Q27061829,Q27065423,Q30149558]) and for this there is the cdk manager.

The cdk manager is one with many features. One is to validate CAS registry numbers, identifiers used by the Chemical Abstract Services:

cdk.isValidCAS("50-00-0")

But let's go to the more interesting functionality around chemical graphs. For example, let's see how we can create molecular structures from a SMILES string:

FromSMILES

Normally, structure diagrams are generated without explicit hydrogens. But we can easily add them:

cdk.addExplicitHydrogens(mol)

We can then calculate a number of properties, including the molecular mass\index{molecular mass}, total formal charge, and molecular formula:

cdk.calculateMass(mol)
cdk.totalFormalCharge(mol)
cdk.molecularFormula(mol)

Additionally, we can also inspect some of in the information present in the model:

cdk.has2d(mol)
cdk.has3d(mol)
cdk.isConnected(mol)

The cdk manager is also central to file support. Before we load it, we may want to just check the file format:

cdk.determineFormat(
  "/ACS Drug Disclosures/AZD5423.cml"
)

However, this information is not needed when loading files:

mol = cdk.loadMolecule(
  "/ACS Drug Disclosures/AZD5423.cml"
)

Saving is quite similar, and there are two methods for the two main formats:

cdk.saveCML(mol, "/Test/mol.cml")
cdk.saveMDLMolfile(mol, "/Test/mol.mol")

The cdx manager

The cdx manager is also based on the CDK and exposes functionality more oriented at CDK developers. For example, we can create a String representation of the full data model for debugging purposes:

cdx.debug(mol)

Or we can see the details of the differences between two data models:

cdx.diff(
  cdk.fromSMILES("CC"),
  cdk.fromSMILES("CCC")
)

And we can list the exact atom types for the atoms in a molecule:

PerceiveCDKAtomTypes

Which lists for ethanol:

PerceiveCDKAtomTypes

The inchi manager

The inchi manager makes functionality from the InChI standard available [Q21030547,Q21092920]. The InChI library is not available as a Java library, but is included as a binary for a selection of platforms and operating systems. This means that we cannot assume the InChI functionality is always available in Bioclipse. Furthermore, we need to load the library:

LoadInChI

But when that has succeeded, we can start minting InChIs:

InChIGenerate

Which returns:

InChIGenerate

The returned value is a class called InChI and we can get both the full InChI as well as the InChIKey from it:

InChIKeyGenerate

The opsin manager

The opsin manager makes functionality from the OPSIN available [Q26481104]: convert IUPAC names to chemical structures.

ParseIUPACName

References