Choral FAQ and Known Issues

    FAQ

    How to check/change the host Oracle calls back to

    
    SELECT * FROM <CHORAL_OWNER>.choral_settings; 

    Configuration

    How to check invalid structures

    1. Use the isvalidmolecule method. Note, there are some structures which are qualified as valid, but cannot be indexed by the cartridge.

    2. Check the logs in <choral_home>/logs. The logging level can be set in <choral_home>/conf/application.properties file.

      • INFO level log files contain only the error.
      • DEBUG level log files contain the the error and the structure as well.
    3. Check warnings in the CHORAL_LOG table (with permission to select the table). See details of logging configuration relating CHORAL_LOG table. Look for messages like Invalid structure with rowid <rowid>: <error>

    4. By this query on an indexed table:

      select t.* from <index_name>_tbl,<table_name> t where is_valid=0 and t.rowid=rid;

      note: this is an unofficial solution, it may be changed without notice between versions.

    5. By this query (preferably on an indexed table):

      select * from test where rowid in (
          select rowid from test
          minus
          select rowid from test where sample_search(mol,'*','SUBSTRUCTURE')=1
      );
    6. The error message will contain a long id as a label (e.g.): “label=> 1101010960388”
      Run the resolve_long_to_rowid.sql script which creates Java and Oracle functions in the database schema, then execute the followings:

    define table_name = '<TABLE_NAME>'; 
    define long_id = <LONG_ID>; 
    select * from &table_name where rowid = (select long_to_rowid('&table_name',&long_id) from dual); 

    How to check the progress of creating an index

    
    SELECT * FROM v$session_longops WHERE sofar<totalwork ORDER BY start_time DESC;

    How to check the execution plan of an sql statement

    EXPLAIN PLAN

    It is advised always to check your select sql statement by generating its explain plan.

    How to import SD file into Oracle database

    Prerequisites

    • sdf file (example.sdf)

    • Oracle DB access with table creation grants

    • sqlldr utility from Oracle Client installation

    Import steps

    The following example will import the example.sdf file records into the mol column. Note that the fields of the sdf records are not separated and imported into separate columns.

    1. Create table in the database (this table will be used to import the data into):

      
      CREATE TABLE testtable (
      id NUMBER GENERATED BY DEFAULT AS IDENTITY,
      mol CLOB NOT NULL);
    2. Create an sqlldr control file with the following content.

      
      sdfload.ctl:
      load data
      infile 'example.sdf' "str '$$$$\n'"
      into table testtable
      fields terminated by '$$$$'
      (mol char(10000))
    3. Load the sdf file from command line with sqlldr command (the command will ask you username and password to access the database). The sqlldr command should be executed in the directory where the example.sdf and sdfload.ctl are located.

      
      sqlldr CONTROL=sdfload.ctl LOG=/tmp/test.log BAD=/tmp/test.bad

    Result

    As a result of the previous operations you will have a table with primary keys in id column and the molecules with additional fields (which are not extracted) in the mol column.

    ORA-06598: insufficient INHERIT PRIVILEGES privilege

    ORA-06598: insufficient INHERIT PRIVILEGES privilege error can be received in case of Oracle 12.1 if statements using the operators, functions of Choral are issued by a dba_user (Sys or System).

    Workaroud recommended:

    
    grant inherit any privileges on user <dba_user> to <choral_owner>;

    Can I use JChem Choral in cloud environment?

    Yes, you can. In Amazon environment, JChem Choral can be installed in the following ways:

    1. install JChem Choral on an EC2 instance by connecting to an Amazon Oracle RDS See documentation
    2. use Amazon Fargate to install JChem Choral with Amazon Oracle RDS See documentation
    3. install an Oracle database on an EC2 instance and install JChem Choral on the same or on a separate EC2 instance


    JChem Choral is tested in Amazon environment. Other cloud provider's environment was not tested, but we see no reason why they shouldn't work.

    How to restore an index marked invalid or unusable?

    Restoring an index marked invalid or unusable if the index data is still intact can be executed in the following ways:

    The safest mode is to rebuild the index by executing

    ALTER INDEX <index_name> REBUILD;
                or
    DROP INDEX <index_name>;
    CREATE INDEX <index_name> ...;

    There is a faster way to make the index work again if some conditions apply

    • The index table in the database still exists and is intact and contains all the rows.

    • The row ids haven't changed in the database after the index being unusable. Be aware that moving data between databases may change the rowids and this condition could fail.

    • The Choral server is running and still contains all the rows (i.e. wasn't reinitialized).

    If all these conditions apply, you can execute:

    exec DBMS_APPLICATION_INFO.SET_MODULE('CHORAL_KEEP_INDEX_DATA', NULL);
    
    ALTER INDEX <index_name> REBUILD;
            or
    DROP INDEX <index_name>;
    CREATE INDEX <index_name> ...;
    
    exec DBMS_APPLICATION_INFO.SET_MODULE(NULL, NULL);

    The whole script should run for only a few seconds, because these steps don't recalculate the index data.

    How to solve ORA-29276: transfer timeout of index creation?

    This error usually comes when there are some extremely big, complex molecules in the column to be indexed.
    ORA-29276: transfer timeout of index creation means that a molecule batch with INDEXING_BATCH_SIZE set in CHORAL_SETTINGS table (default value 10000) was not indexed within the time limit set as HTTP_SESSION_TIMEOUT_SEC in CHORAL_SETTINGS table (default value 3600).
    Decreasing the INDEXING_BATCH_SIZE is recommended, for example to 1000. The optimal value strongly depends on the given dataset.
    Increasing HTTP_SESSION_TIMEOUT_SEC is not recommended.

    How to add custom standardizer actions?

    1. Implement the custom standardizer actions as described here.
    2. Place the created jar file in the BOOT-INF/lib/ folder inside choral service jar file. Use the following steps to achieve this: Choral service jar file is available here: <choral_installation_directory>/lib/choral-backend-server.jar. Place removereactionprop-6.0.0.jar file manually into the folder <choral_installation_directory>/BOOT-INF/lib/ (create the directory if it does not exist).

      cd <choral_installation_directory>
      jar -uvf0 lib/choral-backend-server.jar BOOT-INF/lib/removereactionprop-6.0.0.jar
    3. Create userdefinedstandardizers.xml file with the help of the Standardizer GUI or by editing this example xml:

      <?xml version="1.0" encoding="UTF-8"?>
      <configuration>
      <standardizer-action-list>
      <standardizer-action class="chemaxon.standardizer.actions.RemoveReactionPropAction" editor-class="" id="RemoveReactionPropAction" jar="file:C:/Users/abc/Downloads/removereactionprop-6.0.0.jar"/>
      </standardizer-action-list>
      </configuration>
    4. Place the userdefinedstandardizers.xml under the <user_home>/chemaxon (Windows) or <user_home>/.chemaxon (Linux) folder of the user who runs the choral service.
    5. Update/Create your type files: Use the custom implemented standardizer actions like standardizerAction = aromatize..removereactionproperties
    6. Run init-choral or update-choral.

    How to store the index data in the database instead of the file system?

    1. stop the choral service
    2. re-configure choral.conf: add com.chemaxon.jchem.psql.scheme=gcrdb and the other rows from this document, but instead of RDS jdbc url, user, password, apply your local data.
    3. restart choral service
    4. create choral indexes

    How to receive information about the choral server environment?

    Information about the choral server environment can be received on <host>:<port>/help/about.
    Where host is choral.init.hostname and port is server.port as configured in the application.properties file. Its default value is 8128.

    Known Issues

    Search result inconsistency

    Inconsistent search results might be received if during the run of the search some modifications (insert, update or delete) of stored chemical structures of the chemical structure table(s) affected by the search are committed by a different user than the searcher one.

    Order by inconsistency of uncommitted structures

    Uncommitted chemical structures are always at the end of the order by queries in index mode.

    Slow indexing

    Indexing can be slow due to not doing calculations in parallel in special cases. If the indexed table is not owned by the user who issued the CREATE INDEX command and the indexing user don’t have select role with grant option on the indexed table or doesn’t have the create view role and the cartridge owner doesn’t have select role on the indexed table.

    Order by inconsistency at similarity search in function mode

    If the Oracle Cost Based Optimizer chooses not to use the index at similarity search, ordering based on similarity relevance will be corrupted.

    Slow calculation of chemical terms

    Calculation of chemical terms for great number of structures might be slow.

    Cost estimation for query structures in CLOB

    Cost estimation does not work for query structures in CLOB.

    The following workarounds are suggested to convert the CLOB query to VARCHAR2 and run the search (then cost estimation will work):

    Workaround 1

    
    CREATE OR REPLACE FUNCTION my_search(query CLOB) RETURN NUMBER AS
        query_v VARCHAR2(32767);
        result NUMBER;
    BEGIN
        IF dbms_lob.getlength(query) <= 32767 THEN
            query_v := dbms_lob.substr(query, dbms_lob.getlength(query), 1);
            SELECT COUNT(*) INTO result FROM test WHERE general_search(mol,  query_v, 'SUBSTRUCTURE') = 1;
        ELSE     
            SELECT COUNT(*) INTO result FROM test WHERE general_search(mol,  query, 'SUBSTRUCTURE') = 1;
        END IF;
        RETURN result;
    END;
    /
    
    SELECT my_search(TO_CLOB('CC')) FROM DUAL;

    Workaround 2

    
    CREATE OR REPLACE FUNCTION my_search2(table_name VARCHAR2, query CLOB, strategy VARCHAR2) RETURN NUMBER AS
        query_v VARCHAR2(32767);
        result NUMBER;
    BEGIN
        IF (dbms_lob.getlength(query) <= 32767) then
            query_v := dbms_lob.substr(query, dbms_lob.getlength(query), 1);
            EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM ' || table_name || '  WHERE general_search(mol, ''' || query_v || ''', ''' || strategy || ''') = 1' INTO result;
        ELSE     
            EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM ' || table_name || '  WHERE general_search(mol, ''' || query || ''', ''' || strategy || ''') = 1' INTO result;
        END IF;
        RETURN result;
    END;
    /
    
    SELECT my_search2('test', TO_CLOB('C'), 'SUBSTRUCTURE') FROM DUAL;
    SELECT my_search2('test', TO_CLOB('C'), 'DUPLICATE') FROM DUAL;
    SELECT my_search2('test', (SELECT mol FROM test WHERE id = 4), 'SUBSTRUCTURE') FROM DUAL;

    Cost estimation can not be calibrated automatically yet

    Cost estimation automatically runs for a given environment using predefined values. It is not possible to run an automated calibration for it to adjust to the custom environment. However, cost estimation factors can be adjusted manually and thus the plan selected for chemical search by Oracle can be influenced. See how to make calibration settings manually.

    No more data to read from socket error

    After an apparently successful installation and index creation, the search processes can throw No more data to read from socket error.

    Ocassionally, the error appears also at the end of the installation process.

    At the moment, the only viable workaround is repeating the installation.

    False similarity search results in the case of molecule types with tautomer=GENERIC parameter

    In the case molecule types with tautomer=GENERIC parameter, similarity search gives false results. These is no workaround at the moment, please do not execute similarity search in structure columns having molecule type with tautomer=GENERIC parameter.

    From version 21.9.0 similarity search works correctly even in the case of molecule types with tautomer=GENERIC parameter.