Migration Guide

    from JChem Oracle Cartridge to JChem PostgreSQL Cartridge

    Here we describe a basic migration guide for transferring an Oracle table storing chemical structures indexed with jc_idxtype index of JChem Oracle Cartridge into PostgreSQL database and making it usable for JChem PostgreSQL Cartridge.

    Presumptions

    Oracle side

    Your Oracle table to be migrated is named COMPOUND_STRUCTURES (in our example it is the compound table of the ChEMBL database).

    It has the following columns:

    
    MOLREGNO NUMBER (9,0)
    MOLFILE CLOB
    STANDARD_INCHI VARCHAR2(4000 BYTE)
    STANDARD_INCHI_KEY VARCHAR2(27 BYTE)
    CANONICAL_SMILES VARCHAR2(4000 BYTE)

    The jc_idxtype index created was created on MOLFILE column.

    This index has a parameter relating standardization:

    
    std_conf:sql=select setting_val from
    Chemaxon_settings where abbrev = ''STD_CONFIG''

    This setting_val has the following value:

    
    <StandardizerConfiguration>
    <Actions>
    <RemoveExplicitH ID="RemoveExplicitH"/>
    <Aromatize ID="Aromatize" Type="general"/>
    </Actions>
    </StandardizerConfiguration>

    PostgreSQL side

    The database is PostgreSQL or Enterprise DB (EDB).

    The Cartridge application server can run on the same or on a separate machine.

    Migration steps

    Oracle side

    1. Export the data from the Oracle table into csv file using Oracle SQL Developer (required minimum version number 18.4)

      1. Right click on the table to be exported, and select Export...

        images/download/attachments/9241532/m_1.jpg
      2. Export DDL is not needed, only Export Data to csv.

        Name the exported file as TableMigration.csv

        images/download/attachments/9241532/m_2.jpg
      3. Run the export

        images/download/attachments/9241532/m_3.jpg
    2. Copy file TableMigration.csv to machine of the PostgreSQL server into the /tmp/ folder. The user postgres must have read right on that file.

    PostgreSQL side

    1. Install JChem PostgreSQL Cartridge

      1. Install the jchem-psql-db-server ( jchem-psql-edb-db-server) package on the machine with the PostgreSQL server (EDB server).

      2. Install the jchem-psql-app-server (jchem-psql-edb-app-server) package on the appropriate machine

        Alternatively the package jchem-psql (jchem-psql-edb) containing the full installation can be installed if the JChem Service will run on the same machine where the database server is running.

    2. Create a molecule type file according the standardization set at Oracle cartridge index creation

      1. Make a copy from the file /etc/chemaxon/types/sample.type and name it /etc/chemaxon/types/example.type

      2. In example.type file set

        typeID=2

        standardizerAction = aromatize:g..removeexplicitH

      3. Take care, the file must have the same privileges as sample.type has.

    3. Init jchem-psql service

      sudo service jchem-psql init

    4. Start jchem-psql service

      sudo service jchem-psql start

    5. Connect with psql to the database as user postgres. Step into the database where the cartridge was installed (where the CREATE EXTENSION steps were executed during the installation).

    6. Create table in PostgreSQL with molecule type column for the chemical structures

      
      psql:>
      CREATE TABLE COMPOUND_STRUCTURES
      ( MOLREGNO numeric,
      MOLFILE molecule('example'),
      STANDARD_INCHI VARCHAR(4000),
      STANDARD_INCHI_KEY VARCHAR(27),
      CANONICAL_SMILES VARCHAR(4000) );

      {warning} if the original table in Oracle contained CLOB type columns other than the molecular column then the corresponding column in PostgreSQL should have type TEXT.

    7. Import the data from the csv file into that table

      
      psql:> COPY COMPOUND_STRUCTURES (MOLREGNO,MOLFILE,STANDARD_INCHI,STANDARD_INCHI_KEY, CANONICAL_SMILES) FROM '/tmp/TableMigration.csv' DELIMITER ',' CSV HEADER;
    8. Create index on the chemical structure column

      
      psql:> CREATE INDEX jpc_idx1 on compound_structures USING sortedchemindex(molfile);
    9. Test chemical structure searching

      
      psql:> SELECT COUNT(*) FROM compound_structures WHERE 'Cc1ccccc1' |<| molfile;

    PostgreSQL side if the chemical structures are already filled in a CLOB or TEXT column of a table

    1. Same as above

    2. Same as above

    3. Same as above

    4. Same as above

    5. Same as above

    6. Rename the column storing the chemical structures

      
      psql:> ALTER TABLE compound_structures RENAME molfile TO molfile_original;
    7. Add a new, molecule type column with example subtype to the table

      
      psql:> ALTER TABLE compound_structures ADD COLUMN molfile molecule('example');
    8. Copy the chemical structures by casting to molecule type to the newly created molecule type column

      
      psql:> UPDATE compound_structures SET molfile=molfile_original::molecule('example');
    9. Create index on the chemical structure column

      
      psql:> CREATE INDEX jpc_idx1 on compound_structures USING sortedchemindex(molfile);
    10. Test chemical structure searching

      
      psql:> SELECT COUNT(*) FROM compound_structures WHERE 'Cc1ccccc1' |<| molfile;
    11. Optional. Remove the column storing the original chemical structures if they won’t be needed anymore

      
      psql:> ALTER TABLE compound_structures DROP COLUMN molfile_original;

      Alternatively,

    instead of steps 6, 7, 8, and 11, change the type of the original CLOB or TEXT column to molecule type.

    Be careful, the original chemical structures will be lost.

    
    ALTER TABLE compound_structures ALTER COLUMN molfile TYPE molecule('example') USING molfile::molecule('sample');

    Migration demo

    See the video demonstrating migration from JChem Oracle Cartridge to JChem PostgreSQL Cartridge.