Bemis-Murcko clustering

This manual page describes the Bemis-Murcko clustering algorithm:


Bemis and Murcko outlined a popular method for deriving scaffolds from molecules by removing side chain atoms. A molecular framework can be interpreted as a graph containing nodes and edges representing atom and bond types, respectively. Removing atom and bond labels or agglomerating nodes by chemotype yields a hierarchy of reduced graphs, or molecular equivalence classes, that represent sets of related molecules.

Likewise, a framework can be further decomposed into individual rings (or the core ring assembly) using chemically intuitive rules: the rings can individually or jointly be considered as scaffolds derived from the original compound.

ChemAxon applies an extended Bemis and Murcko framework algorithm to generate clusters. The algorithm first generates scaffolds labelled with atom types, and in the subsequent step the atom labels are cleared and a molecular graph is generated. The methods to compare scaffolds have much variety. Simple numeric metrics or summary statistics based on substructure counts are often used, but these approaches fail to provide much insight into the local and global distribution of similar scaffolds. Topological comparisons using molecular fingerprints or substructure analysis can be done, but they are difficult to interpret owing to the sparse, high-dimensional nature of the data.

Graphical representation of the distributions within populations often provides a much more meaningful construction for understanding complex data.


Fig. 1 Bemis-Murcko framework generation

Usage and options


jklustor [<options>] [<input files>]

Prepare the usage of the jklustor script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts.


 -h, --help help message 
 -c, specify the clustering method 
 -o, --output <filepath> output file path (default: stdout) 
 -t, --tag name of the SDFile tag to store the Pharmacophore Map (default: PMAP) 
 -S, --sdf-output SDF output (otherwise only PMAP list) 
 -g, --ignore-error continue with next molecule on error 
 -v, --verbose print calculation warnings to the console 
 -l, store structures in memory 
 -sd, cache descriptors 
 -s, --port after performing all output actions launch listening server on given port


  • Enumeration of frameworks:

    jklustor C CC CCC C1CC1 C1CCC1
  • Enumerate frameworks using default.sdf from ChemAxon site and write the output in smiles format:

    jklustor -v
  • Enumerate frameworks using inline structures & display results in mview:

    jklustor C CC CCC C1CC1 C1CCC1 -o wrclus:sdf | mview -f ID -
  • GUI initiation via http://localhost:8000 :

    bm.bat -v -c bm -s 8000 -l -sd sample.sdf
  • Structures are read from standard input, frameworks are written out in smiles and to file, input structures are grouped by framework:

    cat input.sdf | jklustor --o wrclus:smiles:--o wrclus:sdf:frameworks.sdf -o "wrmols:sdf:cluster_*.sdf"
  • Write only specified clusters to files (cat is a UNIX command, type is a Windows specific command):

    cat input.sdf | jklustor --o "wrmols:sdf:cluster_*.sdf:id-5,15,40-"

Clustering GUI examples