Screen Developer Guide

    This manual gives you a walk-through on how to use the Screen API:

    Introduction

    This guide gives examples of using Chemaxon's Virtual Screening toolkit API. With the help of these examples experienced programmers can develop their own screening software including the generation of various molecular descriptors, dissimilarity calculations and virtual screening. Besides, users can implement custom molecular descriptors and integrate them into a virtual screening environment.

    The Screen package provides tools and components for ligand based virtual screening. Ligands (i.e. small molecules) are transformed to molecular descriptors which are series (vector) of bit, integer or floating point values. Most descriptors are based on the topology of the small molecule though 3D descriptors incorporating one or more conformations of the molecule can also be introduced.

    Similarity metrics are applied to descriptors to compare them in similarity calculations. Metrics include Tanimoto and Euclidean and their variants (see ScreenMD user's manual for detailed description).

    Basic API usage examples

    The API of the Screen package provides high level, easy to use interface to all types of molecular descriptors provided by Chemaxon. The interface is uniform, so typical API methods do not distinguish between different descriptor types. Thus most examples can easily be modified for other descriptors, too.

    Generating fingerprints

    The first example demostrates how a topological descriptor, the Chemaxon Chemical Fingerprint can be assigned to each individual molecule read from an SDFile. The result of this program is a descriptor file that contains one molecular descriptor per line. The name of the input to process and the output file to be created are given in the command line.

    /*
     * GenerCF.java
     *
     * Created on Nov 14, 2003, 9:22 AM
     */
    import java.io.*;
    import chemaxon.descriptors.*;
    
    /**
     * Simple example code to demostrate the calculation of ChemicalFingerprint.
     * 
     * Takes three command-line arguments as input: input SDfile,
     * output descriptor file
     *
     * @author  Miklos Vargyas
     */
    public class GenerCF {
    
        public static void main( String[] args ) {
    
            try {
                // create a molecular descriptor generator for one descriptor
                GenerateMD descript = new GenerateMD(1);
    
                // set the input file for the generator
                descript.setInput( args[ 0 ] );
                // and tell that input is an SDfile
                descript.setSDfileInput(true);
    
                // specify the descriptor to be created: into which file, what type
                // what parameters settings to be applied
                descript.setDescriptor( args[ 1 ], "CF", new CFParameters(), "" );
    
                // initialize the generator
                descript.init();
    
                // start it and do the entire generation in one go
                descript.run();
    
                // close output file
                descript.close();
            }
            catch ( Exception e ) {
                e.printStackTrace( System.err );
            }
            System.exit(0);
        }
    }

    Dissimilarity calculation

    The next example is more complex as it demonstrates two different aspects of virtual screening at the same time. The key point here is that molecular descriptors are used for dissimilarity calculation. Besides, descriptors are taken from a JChem database table. Such table, the so called descriptor table can be generated prior to calling this sample program by running GenerateMD with the appropriate parameters.

    The program solves a rather simple but demonstrative task: calculates the average dissimilarity of a given query structure and all structures stored in the database (with respect to the particular descriptor type used and dissimilarity metric applied). This code can simply be expanded to calculate the total dissimilarity score of a compound library.

    /*
     * ComparePairwise.java
     *
     * Created on Nov 14, 2003, 19:04 AM
     */
    
    import java.sql.*;
    import java.io.*;
    import java.util.Properties;
    
    import chemaxon.descriptors.*;
    import chemaxon.struc.Molecule;
    import chemaxon.formats.MolImporter;
    import chemaxon.util.ConnectionHandler;
    import chemaxon.jchem.db.SettingsHandler;
    import chemaxon.jchem.db.MDTableHandler;
    
    /**
     * Demonstrates how molecular descriptors stored in a database table can be
     * used in dissimilarity calculations. This simple program calculates the
     * average dissimilarity of a compound library against a user defined query
     * structure. 
    
     * The application takes three parameters: the name of the JChem structure
     * table, the name of the molecular descriptor and the query molecule (e.g.
     * as a smiles string, but filename is not accepted here). Note, that the
     * name of the molecular descriptor is a user given name when the
     * {color:green}generatemd{color} command was executed.
    
     * Be aware that database connection parameters are taken from the .jchem file
     * (i.e. settings used the last time when jcman was used).
     */
    public class ComparePairwise {
    
        public static void main(String[] args) {
    
            String strucTableName = args[ 0 ];
            String sqlSelect = "select cd_id from " + strucTableName;
    
            // names of molecular descriptors used in this dissimilarity calculation
            // this is the name of the descriptor you gave when the descriptor
            // (e.g. fingerprint) was generated with the generatemd command
            String[] mdNames = new String[ 1 ];
            mdNames[ 0 ] = args[ 1 ];
    
            String query = args[ 2 ];
    
            float dissim = 0.0F;
            int counter = 0;
    
            try {
                // open a database connection using settings stored in the .jchem file
                ConnectionHandler connHandler = new ConnectionHandler();
                connHandler.loadValuesFromProperties(
                        new Properties( new SettingsHandler().getSettings() ) );
                connHandler.connect();
    
                // mdTableHandler allows the retrieval of descriptors from database
                MDTableHandler mdTableHandler =
                        new MDTableHandler( connHandler, strucTableName );
                // dbReader gets fingerprints through mdTableHandler
                MDDBReader dbReader = new MDDBReader( strucTableName, connHandler,
                        mdNames, sqlSelect );
    
                // get the first molecular descriptor set from the descriptor table
                // note, that now the set has one component only
                MDSet md = dbReader.next();
    
                // create an identical descriptor set using the same parameters
                // this will store the molecular descriptor of the query molecule
                MDSet queryDescr = new MDSet( md );
                // create a Molecule object from the query string
                Molecule queryMol = MolImporter.importMol( query );
                // generate descriptor for the query molecule using the same
                // settings as found in the database
                queryDescr.generate( queryMol );
    
                // iterate through all descriptors and sum dissimilarities
                while ( md != null ) {
                    dissim += md.getDissimilarity( queryDescr );
                    md = dbReader.next();
                    counter++;
                }
    
                dissim = dissim / counter;
                System.out.println( "Number of descriptors retrieved = " + counter );
                System.out.println( "Average dissimilarity from " + query + " = "
                        + dissim );
    
            }
            catch( Exception ex ) {
                ex.printStackTrace();
            }
        }
    
    }
    

    Advanced API usage

    More advanced usage of the Screen API includes the simultaneous use of several descriptors, the use of the Metrics class and the fine tuning of dissimilarity metrics.

    Custom descriptor implementation

    The Screen package provides a framework for the descriptor/fingerprint generation, storage and retrieval, for similarity/dissimilarity calculations, for virtual screening and for the fine-tuning of dissimilarity scoring functions. As a framework, it does not limit the applicability of tools to the pre-existing molecular descriptors and dissimilarity metrics delivered by Chemaxon. The user can implement custom descriptors that can be integrated in the Screen system in a plug-and-play fashion.

    The sample code in this section illustrates how custom molecular descriptors can be implemented using Chemaxon's technology. The example is a partial implementation of the 166 public MDL keys (MACCS). It has to be noted, that for the sake of easy understanding efficiency was not targeted in this program. A 'real life' application should take more care about faster and parallel recognition of functional groups for the sake of fast operation.

    When generation custom descriptors in the Screen framework, 3 java classes have to be implemented:

    1. the generator class, derived from the MDGenerator class,

    2. the descriptor parameter class, derived from the MDParameter class,

    3. the molecular descriptor class, derived from the MolecularDescriptor class.

    Convenience classes have also been introduced to alleviate the coding work. Examples below derive the MACCS descriptor class as well as the corresponding parameter class from these convenience classes. These classes suit most typical needs, it is seldom needed to inherit from lower level classes.

    Descriptor generator class

    The main function of the descriptor generator class is to assign a molecular descriptor to the given input molecule. Beside its constructor, the only method to be implemented is generate() . This has two parameters, the input Molecule and the output MolecularDescriptor generated.

    Note that the return value, a String array does not store the descriptor. Instead, it contains the names of the properties optionally set for the input molecule. These can include partial results of the descriptor calculation that are believed to be useful and thus kept for later use. The return value is optional, most descriptor generators return null . However, if properties are set by the generator, then those are written in the output SDFile if an SDF output was specified (e.g. in G enerateMD ). This feature can be used for testing purposes.

    /*
     * MaccsGenerator.java
     */
    import chemaxon.descriptors.*;
    
    import chemaxon.util.MolHandler;
    import chemaxon.sss.search.MolSearch;
    import chemaxon.struc.Molecule;
    import chemaxon.calculations.ElementalAnalyser;
    import chemaxon.marvin.modules.Aromata;
    
    import chemaxon.formats.MolFormatException;
    import chemaxon.sss.search.SearchException;
    
    /**
     * Generator class for the <code>Maccs</code> descriptor. A partial
     * implementation the 166 public MDL keys is given here. This class serves
     * demonstration purposes only.
     *
     * @author  Miklos Vargyas
     */
    public class MaccsGenerator extends MDGenerator {
    
        /** performs substructure search */
        private MolSearch search = null;
        /** converts SMART queries into <code>Molecule</code> objects */
        private MolHandler smartsReader = null;
        /** performs elemental analisys of target molecules */
        private ElementalAnalyser elemAnal = null;
        /** aromatizes target molecules and gathers ring information */
        private Aromata arom = null;
    
        /**
         * Creates and initializes a <code>Maccs</code> descriptor generator object.
         * One such object can be re-used to generate multiple descriptors
         * consecutively, there is no need to create one <code>MaccsGenerator</code>
         * instance for each <code>Molecule</code> object.
         */
        public MaccsGenerator() {
            search = new MolSearch();
            smartsReader = new MolHandler();
            smartsReader.setQueryMode( true );
            elemAnal = new ElementalAnalyser();
            arom = new Aromata();
        }
    
        /**
         * Generates the Maccs descriptors for the given molecule. New instance of
         * the <code>Maccs</code> object is not allocated, the
         * <code>MolecularDescriptor</code> provided as a parameter is updated
         * (thus it has to be allocated and initialized by the client of this
         * class).
         * @param   m   molecule for which the Maccs descriptor is created
         * @param   d   the Maccs descriptor generated
         * @return      always null in the case of <code>Maccs</code>
         */
        public String[] generate( Molecule m, MolecularDescriptor d )
                throws MDGeneratorException {
            MaccsParameters params = (MaccsParameters)d.getParameters();
            Maccs Maccs = (Maccs)d;
    
            arom.setMol( m );
            arom.aromatize();
            elemAnal.setMolecule( m );
            search.setTarget( m );
    
            if ( genKey11() )   Maccs.setKey( 0 );
            if ( genKey13() )   Maccs.setKey( 1 );
            if ( genKey14() )   Maccs.setKey( 2 );
            if ( genKey15() )   Maccs.setKey( 3 );
            if ( genKey17() )   Maccs.setKey( 4 );
            if ( genKey19() )   Maccs.setKey( 5 );
            if ( genKey20() )   Maccs.setKey( 6 );
            if ( genKey21() )   Maccs.setKey( 7 );
            if ( genKey22() )   Maccs.setKey( 7 );
            if ( genKey23() )   Maccs.setKey( 8 );
            if ( genKey24() )   Maccs.setKey( 9 );
            if ( genKey25() )   Maccs.setKey( 10 );
            if ( genKey27() )   Maccs.setKey( 11 );
            if ( genKey28() )   Maccs.setKey( 12 );
            if ( genKey29() )   Maccs.setKey( 13 );
            if ( genKey30() )   Maccs.setKey( 14 );
            if ( genKey32() )   Maccs.setKey( 15 );
            if ( genKey33() )   Maccs.setKey( 16 );
            if ( genKey37() )   Maccs.setKey( 17 );
            if ( genKey38() )   Maccs.setKey( 18 );
            if ( genKey39() )   Maccs.setKey( 19 );
            if ( genKey40() )   Maccs.setKey( 20 );
            if ( genKey41() )   Maccs.setKey( 21 );
            if ( genKey42() )   Maccs.setKey( 22 );
            if ( genKey45() )   Maccs.setKey( 23 );
            if ( genKey50() )   Maccs.setKey( 24 );
            if ( genKey60() )   Maccs.setKey( 25 );
            if ( genKey63() )   Maccs.setKey( 26 );
            if ( genKey78() )   Maccs.setKey( 27 );
            if ( genKey84() )   Maccs.setKey( 28 );
            if ( genKey88() )   Maccs.setKey( 29 );
            if ( genKey96() )   Maccs.setKey( 30 );
            if ( genKey99() )   Maccs.setKey( 31 );
            if ( genKey101() )  Maccs.setKey( 32 );
            if ( genKey103() )  Maccs.setKey( 33 );
            if ( genKey118() )  Maccs.setKey( 34 );
            if ( genKey125() )  Maccs.setKey( 35 );
            if ( genKey130() )  Maccs.setKey( 36 );
            if ( genKey131() )  Maccs.setKey( 37 );
            if ( genKey134() )  Maccs.setKey( 38 );
            if ( genKey139() )  Maccs.setKey( 39 );
            if ( genKey140() )  Maccs.setKey( 40 );
            if ( genKey142() )  Maccs.setKey( 41 );
            if ( genKey146() )  Maccs.setKey( 42 );
            if ( genKey149() )  Maccs.setKey( 43 );
            if ( genKey151() )  Maccs.setKey( 44 );
            if ( genKey154() )  Maccs.setKey( 45 );
            if ( genKey157() )  Maccs.setKey( 46 );
            if ( genKey158() )  Maccs.setKey( 47 );
            if ( genKey159() )  Maccs.setKey( 48 );
            if ( genKey160() )  Maccs.setKey( 49 );
            if ( genKey161() )  Maccs.setKey( 50 );
            if ( genKey163() )  Maccs.setKey( 51 );
            if ( genKey164() )  Maccs.setKey( 52 );
            if ( genKey165() )  Maccs.setKey( 53 );
            return null;
        }
    
        private boolean genKey11() { return isRing( 4 ); }
        private boolean genKey13() { return isMatching( "[#8]~[#7](#6)~[#6]" ); }
        private boolean genKey14() { return isMatching( "S-S" ); }
        private boolean genKey15() { return isMatching( "[#6]~[#6](#8)~[#8]" ); }
        private boolean genKey17() { return isMatching( "[#6]#[#6]" ); }
        private boolean genKey19() { return isRing( 7 ); }
        private boolean genKey20() { return elemAnal.atomCount( 14 ) > 0; } /* Si */
        private boolean genKey21() { return isMatching( "[#6]=[#6](#1-#6)[!#1!#6]" ); }
        private boolean genKey22() { return isRing( 3 ); }
        private boolean genKey23() { return isMatching( "[#7]~[#6](#8)~[#8]" ); }
        private boolean genKey24() { return isMatching( "[#7]-[#8]" ); }
        private boolean genKey25() { return isMatching( "[#7]~[#6](#7)~[#7]" ); }
        private boolean genKey27() { return elemAnal.atomCount( 53 ) > 0; } /* I */
        private boolean genKey28() { return isMatching( "[!#1!#6][CH2][!#1!#6]" ); }
        private boolean genKey29() { return elemAnal.atomCount( 15 ) > 0; } /* P */
        private boolean genKey30() { return isMatching( "[#6]~[!#1!#6](#6)(~[#6])~*" ); }
        private boolean genKey32() { return isMatching( "[#6]~S~[#7]" ); }
        private boolean genKey33() { return isMatching( "[#7]~S" ); }
        private boolean genKey37() { return isMatching( "[#7]~[#6](#8)~[#7]" ); }
        private boolean genKey38() { return isMatching( "[#7]~[#6](#6)~[#7]" ); }
        private boolean genKey39() { return isMatching( "[#8]~S(~[#8])~[#8]" ); }
        private boolean genKey40() { return isMatching( "S-[#8]" ); }
        private boolean genKey41() { return isMatching( "[#6]#[#7]" ); }
        private boolean genKey42() { return elemAnal.atomCount( 9 ) > 0; } /* F */
        private boolean genKey45() { return isMatching( "[#6]=[#6]~[#7]" ); }
        private boolean genKey50() { return isMatching( "[#6]=[#6](#6)~[#6]" ); }
        private boolean genKey60() { return isMatching( "S=[#8]" ); }
        private boolean genKey63() { return isMatching( "[#7]=[#8]" ); }
        private boolean genKey78() { return isMatching( "[#6]=[#7]" ); }
        private boolean genKey84() { return isMatching( "[#7H2]" ); }
        private boolean genKey88() { return elemAnal.atomCount( 16 ) > 0; } /* S */
        private boolean genKey96() { return isRing( 5 ); }
        private boolean genKey99() { return isMatching( "[#6]=[#6]" ); }
        private boolean genKey101() { return isLargerRing( 95 ); }
        private boolean genKey103() { return elemAnal.atomCount( 17 ) > 0; } /* [#6]l */
        private boolean genKey118() { return isMore( "*~[CH2]~[CH2]~*" ); }
        private boolean genKey125() { return arom.getAromRings().length > 1; }
        private boolean genKey130() { return isMore( "[!#1!#6]~[!#1!#6]" ); }
        private boolean genKey131() { return isMore( "[!#1!#6]~[H]" ); }
        private boolean genKey134() { return isMatching( "[F,Cl,Br,I]" ); }
        private boolean genKey139() { return isMatching( "[#8][H]" ); }
        private boolean genKey140() { return elemAnal.atomCount( 8 ) > 3; }
        private boolean genKey142() { return elemAnal.atomCount( 7 ) > 1; }
        private boolean genKey146() { return elemAnal.atomCount( 8 ) > 2; }
        private boolean genKey149() { return isMore( "[CH3]" ); }
        private boolean genKey151() { return isMatching( "[#7][H]" ); }
        private boolean genKey154() { return isMatching( "[#6]=[#8]" ); }
        private boolean genKey157() { return isMatching( "[#6]-[#8]" ); }
        private boolean genKey158() { return isMatching( "[#6]-[#7]" ); }
        private boolean genKey159() { return elemAnal.atomCount( 8 ) > 1; }
        private boolean genKey160() { return isMatching( "[CH3]" ); }
        private boolean genKey161() { return elemAnal.atomCount( 7 ) > 0; }
        private boolean genKey163() { return isRing( 6 ); }
        private boolean genKey164() { return elemAnal.atomCount( 8 ) > 0; }
        private boolean genKey165() {
            return arom.getAromRings().length > 0
                    || arom.getNonAromRings().length > 0;
        }
    
        /**
         * Checks if there is at least one rine of the given size in the target
         * structure. Uses the aromatizer (<code>Aromata</code>) object that
         * perceives all rings in the target molecule.
         * @param   ringSize    size of ring searched for
         */
        private boolean isRing( int ringSize ) {
            int[][] aromRings = arom.getAromRings();
            for ( int i = 0; i < aromRings.length; i++ ) {
                if ( aromRings[ i ].length == ringSize ) {
                    return true;
                }
            }
            int[][] aliphRings = arom.getNonAromRings();
            for ( int i = 0; i < aliphRings.length; i++ ) {
                if ( aliphRings[ i ].length == ringSize ) {
                    return true;
                }
            }
            return false;
        }
    
        /**
         * Checks if there is at least one rine of the given size or larger in the
         * target structure. Uses the aromatizer (<code>Aromata</code>) object that
         * perceives all rings in the target molecule.
         * @param   ringSize    size of ring searched for
         */
        private boolean isLargerRing( int ringSize ) {
            int[][] aromRings = arom.getAromRings();
            for ( int i = 0; i < aromRings.length; i++ ) {
                if ( aromRings[ i ].length >= ringSize ) {
                    return true;
                }
            }
            int[][] aliphRings = arom.getNonAromRings();
            for ( int i = 0; i < aliphRings.length; i++ ) {
                if ( aliphRings[ i ].length >= ringSize ) {
                    return true;
                }
            }
            return false;
        }
    
        /**
         * Performs substructure search to check if the given query (specified as
         * DayLight SMARTS) is found in the target structure.
         * @param   smartsQuery query structure to be found
         */
        private boolean isMatching( final String smartsQuery ) {
            try {
                smartsReader.setMolecule( smartsQuery );
                search.setQuery( smartsReader.getMolecule() );
                return search.isMatching();
            }
            catch ( MolFormatException me ) {
                me.printStackTrace();
                // normally this should be rethrown here
            }
            catch ( SearchException se ) {
                se.printStackTrace();
                // normally this should be rethrown here
            }
            return false;
        }
    
        /**
         * Performs substructure search to check if the given query (specified as
         * DayLight SMARTS) is found more than once in the target structure.
         * @param   smartsQuery query structure to be found
         */
        private boolean isMore( final String smartsQuery ) {
            try {
                smartsReader.setMolecule( smartsQuery );
                search.setQuery( smartsReader.getMolecule() );
                int[][] hits = search.findAll();
                return hits != null && hits.length > 1;
            }
            catch ( MolFormatException me ) {
                me.printStackTrace();
                // normally this should be rethrown here
            }
            catch ( SearchException se ) {
                se.printStackTrace();
                // normally this should be rethrown here
            }
            return false;
        }
    }

    Descriptor parameter class

    Most molecular descriptors can be parameterized, for instance the length is a fairly common parameter. The parameter class also introduces the metrics that are compatible with (available for) the new descriptor.

    Descriptor parameters are stored in an XML file that can easily be extended according to future needs. However, compatibility with old versions has to be maintained.

    The convenience class CDParameters (where CD stands for Custom Descriptor) covers almost all typical functionality needed to handle parameters, thus in most cases the parameter class is simply a wrapper for methods delegated by the CDParameters class.

    /*
     * MaccsParameters.java
     */
    
    import chemaxon.descriptors.*;
    import chemaxon.struc.Molecule;
    
    import java.util.*;
    import java.io.*;
    
    import java.lang.IllegalArgumentException;
    
    /**
     * Manages MDL Maccs-II fingerprint parameters. As in the present implementation
     * no external parameters are used (ie. everything is wired into the
     * <code>MaccsGenerator</code> and <code>Maccs</code> classes this class does
     * not play any important role. The reason why it is still implemented is to
     * outline an appropriate framework for optional parameters required by other
     * custom molecular descriptors.<br>
     * The official implementation of MDL Maccs-II by Chemaxon will store all keys
     * in an external XML configuration file, in which case this class will become
     * important: it will process the XML file and store the definitions of keys.
     *
     * @author  Miklos Vargyas
     */
    public class MaccsParameters extends CDParameters {
    
        /**
         * length of the example Maccs keys (max number of bits)
         */
        public final static int DEFAULT_LENGTH = 64;
    
        /**
         * Creates an empty object. Initializes parameters to default values.
         */
        public MaccsParameters() {
            super();
            setLength( DEFAULT_LENGTH );
        }
    
        /**
         * Creates a new object based on a given configuration file.
         * @param   configFile              an open (XML) configuration file
         * @throws  MDParametersException   missing or bad (XML) configuration
         */
        public MaccsParameters(File configFile) throws MDParametersException
        {
            super( configFile );
        }
    
        /**
         * Creates a new object based on a given configuration string.
         * @param   config                  (XML) configuration string
         * @throws  MDParametersException   missing or bad (XML) configuration
         */
        public MaccsParameters(String config) throws MDParametersException
        {
            super( config );
        }
    
        /**
         * Get the default HTML document frame. This is needed
         * @return  default HTML document frame of the MaccsParameters class
         */
        public String getDefaultDocumentFrame() {
            return "<?xml version=\"1.0\" encoding=\"UTF-8\"?> \n" +
                    "<MDL-Maccs-II-ExampleConfiguration Version =\"0.1\" >\n" +
                    "<ScreeningConfiguration>\n" +
                    "   <ParametrizedMetrics>\n" +
                    "        <ParametrizedMetric Name=\"Tanimoto\" ActiveFamily=\"Generic\"\n" +
                    "             Metric=\"Tanimoto\" Threshold=\"0.2\"/>\n" +
                    "   </ParametrizedMetrics>\n" +
                    "</ScreeningConfiguration>\n" +
                    "</MDL-Maccs-II-ExampleConfiguration>\n";
        }
    
        /**
         * Initializes the Maccs fingerprint generator.
         */
        protected void initGenerator() throws MDParametersException {
            generator = new MaccsGenerator();
        }
    
        /**
         * This method is called by the constructors before processing the XML
         * configuration. It creates a <code>ChemicalFingerprint</code> object stored in
         * {@link chemaxon.descriptors.MDParameters#md MDParameters.md}.
         */
        protected void init() {
            md = new Maccs();
        }
    
        /**
         * Calls <code>MaccsGenerator</code> and generates the descriptor for the
         * given molecule.
         * @param   m   a molecular structure
         * @param   md  the molecular descriptor generated for the given molecule,
         *              an output parameter
         * @return      names of Molecule Property-s (SDfile tags) set by the generator
         * @throws  MDGeneratorException    when failed to generate descriptor
         */
        protected String[] generate( final Molecule m, MolecularDescriptor md )
                throws MDGeneratorException {
            return generator.generate( m, md );
        }
    
    }
    

    The descriptor class

    The main purpose of the descriptor class is to provide the connections for the plug-and-play interface, via its constructors and some miscellaneous methods like getName(). This example code illustrates the use of binary fingerprint like descriptors, however, integer vector or floating point vector type descriptors can be implemented the same way (with the appropriate obvious changes).

    If, however, the descriptor to be implemented is neither a binary fingerprint, nor and integer/float vector like descriptor, then the convenience classes cannot be used. In these are rather rare cases the implemetor of the new descriptor has lot more coding work to do.

    /*
     * Maccs.java
     */
    
    import chemaxon.descriptors.*;
    import chemaxon.struc.Molecule;
    
    import java.util.Arrays;
    import java.util.StringTokenizer;
    import java.text.ParseException;
    
    /**
     * Implements MDL MACCS-II intDescr. This class serves demonstration purposes, thus
     * only a portion of the original intDescr are implemented.
     *
     * @author  Miklos Vargyas
     */
    
    public class Maccs extends CustomDescriptor {
    
        /**
         * Creates a new, empty MACCS descriptor.
         */
        public Maccs() {
            super( CDParameters.BINARY_DESCRIPTOR, 64 );
        }
    
        /**
         * Copy constructor. An identical copy of the <code>MACCS</code>
         * passed is created. The old and the new instances share the same
         * <code>MACCSParameters</code> object.
         *
         * @param   md a MACCS descriptor to be copied
         */
        public Maccs( final Maccs md ) {
            super( md );
        }
    
        /**
         * Create a new empty instances according to parameter configuration.
         * @param  params   parameter settings
         */
        public Maccs(final CDParameters params) {
            super( params );
        }
    
        /**
         * Creates a new instance according to parameters passed in a string.
         *
         * @param  params   parameter string
         */
        public Maccs(final String params)
        {
            super( params );
        }
    
        /**
         * Creates a copy with identical internal state. The new instances share the
         * same <code>MACCSParameters</code> object with the copied one.
         * @return  the newly created object
         */
        public Object clone() {
            return new Maccs( this );
        }
    
    // !!!!!!!!!!!!!!!!!!!!!!
    // you will need to change string costants here by simply replacing MACCS with
    //  the name of your fingerprint
    
        /**
         * Gets the nice name of the <code>MACCS</code> descriptor object. This name
         * is not the same as the class name: it is nicer, and more meaningful for
         * end-users.
         * @return  the nice, external name for MACCS descriptor class objects
         */
        public String getName() {
            return "MDL MACCS-II descriptor";
        }
    
        /**
         * Gets the short name of the descriptor. This name appears in text outputs.
         * @return  the short name used in text outputs (tables etc.)
         */
        public String getShortName() {
            return "Maccs";
        }
    // and similarly, the name of your parameters class
        /**
         * Gets the name of the parameters class corresponding to the descriptor.
         * @return  the name of the parameters class
         */
        public String getParametersClassName() {
            return "MaccsParameters";
        }
    
    // !!!!!!!!!!!!!!!!!!!!!!
    
        /**
         * Sets the given cell (key) to one. Individual cells cannot be cleared
         * (ie. set to zero), only the whole descriptor (see <code>clear()</code>).
         * @param   cellIndex   index of the cell (key) to be set (to one)
         */
        public void setKey( int cellIndex ) {
            set( cellIndex, 1 );
        }
    
        /**
         * Gets the value (0 or 1) of the given cell (key).
         * @param   cellIndex   index of the cell (key) to be set (to one)
         */
        public int getKey( int cellIndex ) {
            return getBit( cellIndex );
        }
    
        /**
         * Creates the MACCS descriptor for the given Molecule. Calls the generator
         * created by the corresponding <code>MACCSParameters</code> class.
         * @return  property names set in the molecule during generation (zero in
         *          the case of this particular class)
         * @throws  MDGeneratorException    when failed to generate descriptor
         */
        public String[] generate( final Molecule m ) throws MDGeneratorException {
            clear();
            try {
                String[] res = ( (MaccsParameters)params ).generate( m, this );
                return res;
            }
            catch ( NullPointerException ne ) {
                ne.printStackTrace();
    // !!!!!!!!!!!!!!
    // just replace MACCS as appropriate
                throw new MDGeneratorException( "Something went wrong in MACCS generator." );
    // !!!!!!!!!!!!!!
            }
        }
    
    }
    

    Closing remarks

    Users are encouraged to contribute their custom descriptor implementations to our public discussion forum, see for instance Florian Pitschi's work.