Migration Guide

from JChem Oracle Cartridge to JChem PostgreSQL Cartridge

Here we describe a basic migration guide for transferring an Oracle table storing chemical structures indexed with jc_idxtype index of JChem Oracle Cartridge into PostgreSQL database and making it usable for JChem PostgreSQL Cartridge.

Presumptions

Oracle side

Your Oracle table to be migrated is named COMPOUND_STRUCTURES (in our example it is the compound table of the ChEMBL database).

It has the following columns:

MOLREGNO NUMBER (9,0)
MOLFILE CLOB
STANDARD_INCHI VARCHAR2(4000 BYTE)
STANDARD_INCHI_KEY VARCHAR2(27 BYTE)
CANONICAL_SMILES VARCHAR2(4000 BYTE)

The jc_idxtype index created was created on MOLFILE column.
This index has a parameter relating standardization:

std_conf:sql=select setting_val from
Chemaxon_settings where abbrev = ''STD_CONFIG''

This setting_val has the following value:

<StandardizerConfiguration>
<Actions>
<RemoveExplicitH ID="RemoveExplicitH"/>
<Aromatize ID="Aromatize" Type="general"/>
</Actions>
</StandardizerConfiguration>

PostgreSQL side

The database is PostgreSQL or Enterprise DB (EDB).
The Cartridge application server can run on the same or on a separate machine.

Migration steps

Oracle side

  1. Export the data from the Oracle table into csv file using Oracle SQL Developer (required minimum version number 18.4)

    1. Right click on the table to be exported, and select Export...

      images/download/attachments/9241532/m_1.jpg

    2. Export DDL is not needed, only Export Data to csv.
      Name the exported file as TableMigration.csv

      images/download/attachments/9241532/m_2.jpg

    3. Run the export

      images/download/attachments/9241532/m_3.jpg

  2. Copy file TableMigration.csv to machine of the PostgreSQL server into the /tmp/ folder. The user postgres must have read right on that file.

PostgreSQL side

  1. Install JChem PostgreSQL Cartridge

    1. Install the jchem-psql-db-server ( jchem-psql-edb-db-server) package on the machine with the PostgreSQL server (EDB server).

    2. Install the jchem-psql-app-server (jchem-psql-edb-app-server) package on the appropriate machine

      Alternatively the package jchem-psql (jchem-psql-edb) containing the full installation can be installed if the JChem Service will run on the same machine where the database server is running.

  2. Create a molecule type file according the standardization set at Oracle cartridge index creation

    1. Make a copy from the file /etc/chemaxon/types/sample.type and name it /etc/chemaxon/types/example.type

    2. In example.type file set
      typeID=2
      standardizerAction = aromatize:g..removeexplicitH

    3. Take care, the file must have the same privileges as sample.type has.

  3. Init jchem-psql service
    sudo service jchem-psql init

  4. Start jchem-psql service
    sudo service jchem-psql start

  5. Connect with psql to the database as user postgres. Step into the database where the cartridge was installed (where the CREATE EXTENSION steps were executed during the installation).

  6. Create table in PostgreSQL with molecule type column for the chemical structures

    psql:>
    CREATE TABLE COMPOUND_STRUCTURES
    ( MOLREGNO numeric,
    MOLFILE molecule('example'),
    STANDARD_INCHI VARCHAR(4000),
    STANDARD_INCHI_KEY VARCHAR(27),
    CANONICAL_SMILES VARCHAR(4000) );

    if the original table in Oracle contained CLOB type columns other than the molecular column then the corresponding column in PostgreSQL should have type TEXT.

  7. Import the data from the csv file into that table

    psql:> COPY COMPOUND_STRUCTURES (MOLREGNO,MOLFILE,STANDARD_INCHI,STANDARD_INCHI_KEY, CANONICAL_SMILES) FROM '/tmp/TableMigration.csv' DELIMITER ',' CSV HEADER;
  8. Create index on the chemical structure column

    psql:> CREATE INDEX jpc_idx1 on compound_structures USING sortedchemindex(molfile);
  9. Test chemical structure searching

    psql:> SELECT COUNT(*) FROM compound_structures WHERE 'Cc1ccccc1' |<| molfile;

PostgreSQL side if the chemical structures are already filled in a CLOB or TEXT column of a table

  1. Same as above

  2. Same as above

  3. Same as above

  4. Same as above

  5. Same as above

  6. Rename the column storing the chemical structures

    psql:> ALTER TABLE compound_structures RENAME molfile TO molfile_original;
  7. Add a new, molecule type column with example subtype to the table

    psql:> ALTER TABLE compound_structures ADD COLUMN molfile molecule('example');
  8. Copy the chemical structures by casting to molecule type to the newly created molecule type column

    psql:> UPDATE compound_structures SET molfile=molfile_original::molecule('example');
  9. Create index on the chemical structure column

    psql:> CREATE INDEX jpc_idx1 on compound_structures USING sortedchemindex(molfile);
  10. Test chemical structure searching

    psql:> SELECT COUNT(*) FROM compound_structures WHERE 'Cc1ccccc1' |<| molfile;
  11. Optional. Remove the column storing the original chemical structures if they won’t be needed anymore

    psql:> ALTER TABLE compound_structures DROP COLUMN molfile_original;

Alternatively,

instead of steps 6, 7, 8, and 11, change the type of the original CLOB or TEXT column to molecule type.

Be careful, the original chemical structures will be lost.

ALTER TABLE compound_structures ALTER COLUMN molfile TYPE molecule('example') USING molfile::molecule('sample');