Page tree

This documentation page collects the deprecated and removed methods of JChem PostgreSQL Cartridge.

Removals

Fingerprint generation

Fingerprints can be generated using the following function:

fingerprint(chemical_structure,size)

or

fingerprint(structure_column_name,size)

where size is the number of bits in the fingerprint bit string; it must be divisible by 64, e.g.: 512.

The generated fingerprints refer to the original, non-standardized molecules even if standardizer configuration is defined for the given molecule type column.

Example:

 

SELECT fingerprint('C1CCCCC1',512);

Smilarity search - old methods

For similarity searches with the old methods - which are supported before version 2.5 - you must have a column filled with fingerprints. These fingerprints correspond to molecules in another table (old method 1) or to molecules in the same table (old method 2). We strongly advise using the other table method (old method 1) because of the performance. The fingerprint data are usually small and PostgreSQL database engine can store them in memory.

Fingerprints can be generated as described above.

It is advised to name the fingerprint column as fp   - in order to make similarity search easily runnable from applications based on JChem PostgreSLQ Cartridge API.

The following statements should be used for running similarity search using old method 1:

CREATE TABLE molecule_table_name(structure_column_name MOLECULE('molecule_type_name'), id INTEGER);
CREATE INDEX molecule_table_idx ON molecule_table_name(id);
CREATE TABLE fingerprint_table_name AS
	SELECT fingerprint(structure_column_name,512) fp, id FROM molecule_table_name;
CREATE INDEX fingerprint_idx ON fingerprint_table_name(fp);
CREATE INDEX fingerprint_id_idx ON fingerprint_table_name(id);

SELECT * FROM fingerprint_table_name 
	WHERE tanimoto(fingerprint(query_structure,512),fp) operator similarity_value;
SELECT * FROM (SELECT ft.*, tanimoto(fingerprint(query_structure,512),fp) AS tanimoto FROM fingerprint_table_name AS ft)
	WHERE tanimoto operator similarity_value ORDER by tanimoto DESC;

where

operator can be <, <=, =, >, >=

similarity_value is a number between 0 and 1

fp is the recommended name of the fingerprint column

Example:

    CREATE TABLE moltable(mol MOLECULE('sample'), id INTEGER);
	CREATE INDEX moltable_idx ON moltable(id);

Insert molecules into moltable and/or create a trigger, then continue with the followings:

    CREATE TABLE fptable AS SELECT fingerprint(mol,512) fp, id FROM moltable;
	CREATE INDEX fptable_idx ON fptable(fp);
	CREATE INDEX fptable_id_idx ON fptable(id);
 
    SELECT * FROM fptable WHERE tanimoto(fingerprint('CCC',512),fp) > 0.9;
--to get the results in descending order by similarity
   SELECT t.id,t.tanimoto, moltable.mol FROM (SELECT fptable.*, tanimoto(fingerprint('CCC',512),fp) AS tanimoto FROM fptable) AS t, moltable
        WHERE t.id = moltable.id AND tanimoto > 0.9 ORDER BY tanimoto DESC;

 

The following statements should be used for running similarity search using old method 2:

CREATE TABLE table_name(structure_column_name MOLECULE('molecule_type_name'));
ALTER TABLE table_name ADD COLUMN fp BYTEA;
UPDATE table_name SET fp = fingerprint(structure_column_name,512);
SELECT * FROM table_name 
	WHERE tanimoto(fingerprint(query_structure,512),fp) operator similarity_value;

Example:

    CREATE TABLE simtable(mol MOLECULE('sample'));
    ALTER TABLE simtable ADD COLUMN fp BYTEA;
    UPDATE simtable SET fp=fingerprint(mol,512);
    SELECT * FROM simtable 
		WHERE tanimoto(fingerprint('CCC',512),fp) > 0.9;

Please note that fingerprint values present in fingerprint columns must be recalculated when molecules are updated.

Known Issue

Update of tables with more than about 1000 records might be very slow.

You can create a trigger to update the fingerprint column in the fingerprint table (fptable) when new records are inserted into the molecule table (moltable).

Example:

CREATE TABLE moltable(mol MOLECULE('sample'), id INTEGER);
CREATE TABLE fptable AS SELECT fingerprint(mol,512) fp, id FROM moltable;
 
CREATE OR REPLACE FUNCTION set_fingerprint()
  RETURNS trigger AS
$BODY$
BEGIN
 IF (TG_OP = 'DELETE') THEN
  DELETE FROM fptable WHERE id = OLD.id;
  RETURN OLD;
 END IF;
 IF (TG_OP = 'UPDATE') THEN
  IF NEW.mol is null THEN
   UPDATE fptable SET id = NEW.id, fp = null where id = OLD.id;
  END IF;
  UPDATE fptable SET id = NEW.id, fp = fingerprint(NEW.mol, 512) where id = OLD.id;
  RETURN NEW;
 END IF;
 IF (TG_OP = 'INSERT') THEN
  IF NEW.mol is null THEN
   INSERT INTO fptable (fp, id) VALUES (null,NEW.id);
  END IF;
   INSERT INTO fptable (fp, id) VALUES (fingerprint(NEW.mol, 512),NEW.id);
  RETURN NEW;
 END IF;
END;
$BODY$
LANGUAGE plpgsql;
  
DROP TRIGGER IF EXISTS tr_fingerprint ON moltable;
CREATE TRIGGER tr_fingerprint BEFORE INSERT OR UPDATE OR DELETE ON moltable
 FOR EACH ROW EXECUTE PROCEDURE set_fingerprint(); 

Limitations:

  • Query structures with query features (like list atoms, query atoms, query bonds, ... ) are not supported.

Relevance sorting by using relevance function

Available from version 1.8 and deprecated in version 2.6

As in standard SQL, the user can order his results using ORDER BY commands.

For ordering search results, JChem PostgreSQL function relevance(Molecule) is provided, which gives back a numeric type value based on the atom counts and further topological features of the molecule.

It is suggested that relevance values be stored in the table for further query. It is also suggested that an index is created on the relevance column and further queries return their results ordered by the relevance value. This facilitates the usage of  LIMIT <n>  conditions as the most relevant hits are at the beginning of the result set.  If the relevance column is created upon table creation or import  the addition of the chemical index should also occur after adding the relevance column.

ALTER TABLE <mytable> ADD COLUMN <relevance_column> INT;
UPDATE <mytable> SET <relevance_column> = relevance(mol)::int;
CREATE INDEX <relevance_index> on <mytable>(<relevance_column>);

SELECT mol FROM <mytable> WHERE 'query_structure' |<| mol ORDER BY <relevance_column> LIMIT <n>;

Example  (assuming table "test" has column "mol" of type Molecule):

ALTER TABLE test ADD COLUMN relev INT;
UPDATE TEST SET relev = relevance(mol)::int;
CREATE INDEX relev_idx on test(relev);

SELECT mol FROM test WHERE 'c1ccccc1' |<| mol ORDER BY relev LIMIT 100;

Other chemical features/measures may also be used for ordering, the chemterm function provides help for their definition.

You can create a trigger to update the relevance column when new records are inserted into the table.

Example:

CREATE TABLE test (mol Molecule('sample'));
ALTER TABLE test ADD COLUMN relev INT;

CREATE OR REPLACE FUNCTION set_relevance()
  RETURNS trigger AS
$BODY$
BEGIN
    IF NEW.mol is null THEN
      NEW.relev:=NULL;
    END IF;
    NEW.relev:=relevance(NEW.mol)::int;
    RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
 
DROP TRIGGER IF EXISTS tr_relevance ON test;
CREATE TRIGGER tr_relevance BEFORE INSERT OR UPDATE ON test
 FOR EACH ROW EXECUTE PROCEDURE set_relevance();
  
UPDATE test SET relev=relevance(mol)::int;