The following four sections of this page explain how to complete the following tasks :
Administer a table in MySQL using Instant JChem.
Add a text field and populate it with a character based unique key.
Define an approach to derive a BOOLEAN data item "Lipinski rule of 5" for new rows based upon chemical terms, implemented as a "BEFORE INSERT" trigger.
Define an approach to synchronise columns for which the source data is subsequently updated, implemented as a "BEFORE UPDATE" trigger. This operation is required for example after a re-standardization operation is applied on a given JChem table.
It is hoped that these examples can be used as templates for further bespoke index based in Chemical terms or other calculated columns. In the case of MySQL connection to the schema can be achieved with following syntax assuming a mysql client
mysql -u jchemuser -h localhost -p mydb
Alternatively, the IJC database explorer will negate the need for this and connection using this approach for administration tasks is explained here: Using the database explorer
First, we need to administer a JChem table. One method to do this is via Instant JChem (IJC) Desktop application. Assuming we have an existing MySQL connection / schema established in IJC then we can add a new JChem table easily. Right click on your schema and choose "new structure entity table". At the entity type drop down choose "New structure entity (using JChemBase table)". Assuming we accept the defaults - we should see the window below and the subsequent examples assume a table name of "MOLECULES"
A JChem table already has a build in unique integer key (CD_ID), however some organisations prefer to have a character based identifier since this can aid internal communication of structures. In order to create a character based sequence the integer based sequence is embedded in a trigger and prefixed with an appropriate string in this case "mol-" is used. If this is a "private" database then the Chemist initials (or lab notebook reference) might be suitable or perhaps a corporate identifier string is suitable if working with the central data repository.
An additional text based column is now required to store the character based key and this can be created using IJC. Right click on your table and choose "Edit data tree" Left click on "New standard field" and choose "Text field". You can also edit your display name and column name. In this case the column is named "TARGET_KEY". The Text field maps directly to a column of type VARCHAR in MySQL. You also need to retain the "Required" value of FALSE.
Since no actual sequence object exists in MySQL, first we should need to create a sequences emulator table with each row in the table representing a separate sequence. In order to run this syntax you will need to connect to the same MySQL schema that IJC is connected to.
mysql -u jchemuser -h localhost -p mydb CREATE TABLE sequences (sequence_name VARCHAR(50),currval INT); INSERT INTO sequences (sequence_name,currval) VALUES ('STRUCTURES',0);
We now create a trigger on the target table STRUCTURES that uses the seed table sequence emulator to populate the "TARGET_KEY" column
delimiter / drop trigger structures_key_trg/ CREATE TRIGGER structures_key_trg BEFORE INSERT ON structures FOR EACH ROW BEGIN UPDATE sequences set currval = currval + 1 where sequence_name = 'STRUCTURES'; SET NEW.target_key = CONCAT ('mol-',(SELECT currval FROM sequences WHERE sequence_name = 'STRUCTURES')); END;/
To fire the trigger we should simply insert data into the table and for every row the action will occur. IJC will be completing the insert (during data import) so you should not need to execute this syntax but your trigger will fire as a result of this action, if you should wish to test
INSERT INTO structures (cd_id) VALUES (1);/
In order to create derivations from existing data an individual or set of triggers can be used. In this example a single drug like filter column "Lipinski_5" is created and populated for any rows that are added to a JChem table. The IJC boolean type actually maps to a TINYINT type in MySQL. You also need to retain the "Required" value of FALSE.
A pre-requisite for this field is the existence of the necessary chemical terms columns it is derived from (at compile time). Also it is assumed they will be populated and not NULL and if created prior to a data import they will be populated. Several filter definitions already exist but the use of triggers provides an alternative option to avoid some repeat calculation and of course create new bespoke definitions. The Lipinski definition example here is taken from the following reference but has been slighty modified to include the value in each clause :
Reference : Chemoinformatics (Wiley), Page 607, Gasteiger J.
molecular weight <= 500 g / mol
a calculated log p <= 5
H-bond donors <= 5
H-bond acceptors <= 10
rotatable bond count <= 10
As such the following Chemical terms fields are selected for addition to the table and are populated on data import.
The Trigger can now be created as follows:
delimiter / CREATE TRIGGER structures_lipinksi5_instrg BEFORE INSERT ON structures FOR EACH ROW BEGIN SET NEW.lipinksi_5 = 0; IF (NEW.cd_molweight <= 500 AND NEW.logp <= 5 AND NEW.h_bond_donors <= 5 AND NEW.h_bond_acceptors <= 10 AND NEW.rotatable_bonds <= 10) THEN SET NEW.lipinksi_5 = 1; END IF; END;/
In some earlier versions of MySQL you are not allowed to have multiple triggers with the same firing clause and the way around this constraint is to merge your triggers into one. In this case we will merge the structures_key_trg with structures_lipinksi5_trg
delimiter / CREATE TRIGGER structures_lipinksi5_trg BEFORE INSERT ON structures FOR EACH ROW BEGIN UPDATE sequences set currval = currval + 1 WHERE sequence_name = 'STRUCTURES'; SET NEW.target_key = CONCAT ('mol-',(SELECT currval FROM sequences WHERE sequence_name = 'STRUCTURES')); SET NEW.lipinksi_5 = 0; IF (NEW.cd_molweight <= 500 AND NEW.logp <= 5 AND NEW.h_bond_donors <= 5 AND NEW.h_bond_acceptors <= 10 AND NEW.rotatable_bonds <= 10) THEN SET NEW.lipinksi_5 = 1; END IF; END;/
If the standardization rules for a given table are modified or a chemical structure edited then potentially the chemical terms might change also. Any index based on these columns will have no update mechanism defined. As such, one should also consider an after update trigger to re-calculate derived fields after a re-standardization event. The trigger can now be created as follows
delimiter / CREATE TRIGGER structures_lipinksi5_updtrg BEFORE UPDATE ON structures FOR EACH ROW BEGIN SET NEW.lipinksi_5 = 0; IF (NEW.cd_molweight <= 500 AND NEW.logp <= 5 AND NEW.h_bond_donors <= 5 AND NEW.h_bond_acceptors <= 10 AND NEW.rotatable_bonds <= 10) THEN SET NEW.lipinksi_5 = 1; END IF; END;/
The end result is a drug like filter and less time screening unnecessary molecules