Choral API Usage

This document describes how to use the API of JChem Choral.

Please be sure, the chemical knowledge behind this API is the same as the knowledge used by JChem Base. In the background of the chemical structure searches, the same molecular comparisons, transformations, calculations are executed as in the case of JChem Base. Both the Choral cartridge and the JChem Oracle cartridge use a subset of the functionality found in JChem Base. The differences between the two cartridges are collected here.

Create Table

Use standard Oracle methods.

The column type of chemical structures must be CLOB.

Insert Into Table

Use standard Oracle methods.

Filter out the invalid molecules from the table

The invalid, erroneous molecules must be deleted from the table before the CREATE INDEX statement is executed. The use of isvalidmolecule function is recommended for identifying the invalid molecules.

SELECT * FROM <table_name> WHERE isvalidmolecule(<structure_column_name>)=0;

Example

SELECT + FROM mytable WHERE isvalidmolecule(mol)=0;

{primary} From version 20.10, non-valid structures do not need to be filtered out because they also will be indexed. However, they won't hit any structure, including themselves.

Create domain index as table_owner

{warning} Prerequisite:

Execute privilege on <type>_idxtype

CREATE INDEX <index_name> ON <table_name>(<structure_column_name>) INDEXTYPE IS <type>_idxtype;

Where

<type> can be one of the type file names present in /data/types/ when the service was initialized

The column to be indexed must be CLOB (Varchar2 is handled as well, but not recommended).

Example

CREATE INDEX myidx ON mytable(mol) INDEXTYPE IS choral_owner.sample_idxtype;

{warning} Use short index names!

{warning} Don't create index with column names in apostrophes. It won't fail, but the index won't be used at search!
--don't use this: CREATE INDEX <index_name> ON <table_name>('<structure_column_name>') INDEXTYPE IS <type>_idxtype;

See possibilities for finding invalid structures.

Cache loading

The cache is loaded when the first search is executed, but it can be loaded intentionally at any time using the following statements:

call load_molecule_cache('INDEX_OWNER', 'INDEX_NAME');
call load_fingerprint_cache('INDEX_OWNER', 'INDEX_NAME');

Both statements are recommended to be executed if the necessary memory is available. Please, enter index_owner and index_name with capital letters.

In the case of low memory setup use only the load_fingerprint_cache function.

Execute chemical searches as searcher user

{warning} Prerequisites:

Execute privilege on <type>_search and <type>_relevance operators.

Select privilege on the table.

Substructure, duplicate, superstructure, full fragment searches

Search a query structure on a target table

SELECT [/*+ <hints> */] * FROM <table_owner>.<table_name> 
    WHERE <choral_owner>.<type>_search(<structure_column_name>, <query_structure>, <search_type>[,integer])=1 [ORDER BY <type>_relevance(integer)];

Search a query structure on a target structure

SELECT * FROM DUAL WHERE <choral_owner>.<type>_search(<target_structure>, <query_structure>, <search_type>)=1;

Where

<type> can be one of the type file names present in /data/types/ when the service was initialized

<Search_type> can be:

SUBSTRUCTURE

DUPLICATE

SUPERSTRUCTURE

FULLFRAGMENT (Available from version 20.15)

<hints> : see Adding Hints

Examples

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'CCC', 'SUBSTRUCTURE')=1;

SELECT /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'CCC','SUBSTRUCTURE',1)=1 ORDER BY choral_owner.sample_relevance(1);

Search for a limited number of the most relevant hits

Example

SELECT * FROM (
    SELECT /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable WHERE
    choral_owner.sample_search(mol, 'benzene', 'SUBSTRUCTURE',
    1) = 1 ORDER BY choral_owner.sample_relevance(1)
) WHERE rownum <= 10;

The integer 1 in the above example is needed because the <type>relevance operator is the ancillary operator of the <type>_search operator. The call of the ancillary operator must be paired with the original operator. Pairing means adding the same number parameter to each call. It doesn't matter what is this number, the point is to be the same.

Highlighting hits of substructure search

Example

SELECT sample_hit_highlight(2) FROM my_mol_table WHERE sample_search(mol, 'CCC', 'SUBSTRUCTURE', 2) = 1;

The integer 2 in the above example is needed because the <type>hit_highlight operator is the ancillary operator of the <type>_search operator. The call of the ancillary operator must be paired with the original operator. Pairing means adding the same number parameter to each call. It doesn't matter what is this number, the point is to be the same.

Full fragment search

From version 20.15, the following syntax is available:

Example

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'CCC', 'FULLFRAGMENT')=1;

In older versions than 20.15, only the following syntax works.

Full fragment search can be executed by transforming the query structure in a way that it matches only a full fragment of the target structure and by executing a substructure search on this modified query structure. The transformation adds 's*' query search property to all atoms.

Example

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'CCC','SUBSTRUCTURE','FULLFRAGMENT')=1;

Double bond stereo option: dbsmarkedonly

By default, CIS query matches only with CIS target and TRANS query matches only with TRANS target. If matching of CIS query with both CIS and TRANS targets or matching of TRANS query with both CIS and TRANS targets is aimed, then the query molecule has to be transformed.

The option dbsmarkedonly is provided to accomplish this transformation.

Example

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C\C=C\C', 'SUBSTRUCTURE','DBSMARKEDONLY')=1;
SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C\C=C\C', 'SUBSTRUCTURE','DBSMARKEDONLY..FULLFRAGMENT')=1;

Ignore tetrahedral stereo

The tetrahedral stereo information stored on the query structure can be ignored using IGNORETETRAHEDRALSTEREO option.

Behavior until version 23.17.0: Tetrahedral stereo information stored in the query structure can be ignored during substructure and fullfragment search. The transformation removes the tetrahedral stereo information form the query structures.

Behavior from version 24.1.0: Tetrahedral stereo information stored in the query structure and in the target structures can be ignored during substructure, fullfragment and duplicate search.

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C[C@H](n)C(O)=O', 'SUBSTRUCTURE','IGNORETETRAHEDRALSTEREO')=1;
SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C[C@H](n)C(O)=O', 'SUBSTRUCTURE','IGNORETETRAHEDRALSTEREO..FULLFRAGMENT')=1;

Ignore charge

The charge information stored on the query structure can be ignored using IGNORECHARGE option. Available from version 21.13.

Behavior until version 23.17.0: Charge information stored in the query structure can be ignored during substructure and fullfragment search. The transformation removes the charge form the query structures.

Behavior from version 24.1.0: Charge information stored in the query structure and in the target structures can be ignored during substructure, fullfragment and duplicate search.

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, '[NH3+]C1=CC=CC=C1', 'SUBSTRUCTURE','IGNORECHARGE')=1;
SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, '[NH3+]C1=CC=CC=C1', 'SUBSTRUCTURE','IGNORETETRAHEDRALSTEREO..IGNORECHARGE')=1;

Ignore isotope

The isotope information stored on the query structure can be ignored using IGNOREISOTOPE option. Available from version 21.13.

Behavior until version 23.17.0: Isotope information stored in the query structure can be ignored during substructure and fullfragment search. The transformation removes the isotope form the query structures.

Behavior from version 24.1.0: Isotope information stored in the query structure and in the target structures can be ignored during substructure, fullfragment and duplicate search.

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C1CC[13CH2]CC1', 'SUBSTRUCTURE','IGNOREISOTOPE')=1;
SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'C1CC[13CH2]CC1', 'SUBSTRUCTURE','IGNORETETRAHEDRALSTEREO..IGNOREISOTOPE')=1;

Similarity search

SELECT * FROM <table_owner>.<table_name> WHERE <choral_owner>.<type>_search(<structure_column_name>, <query_structure>, 'SIMILARITY'[,integer])<operator><threshold_number> [ORDER BY <type>_relevance(integer)];

Where

<type> can be one of the type file names present in /data/types/ when the service was initialized

<operator> can be: <= or >=

<threshold_number> : number between 0 and 1

Examples

SELECT * FROM table_owner.mytable WHERE choral_owner.sample_search(mol, 'CCC','SIMILARITY')>=0.9;

select /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable 
    WHERE choral_owner.sample_search(mol, 'CCC','SIMILARITY',1)>=0.9 
    ORDER BY choral_owner.sample_relevance(1);

Example for limit

SELECT * FROM (SELECT /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable 
    WHERE choral_owner.sample_search(mol, 'CCC','SIMILARITY',1)>=0.9 
    ORDER BY choral_owner.sample_relevance(1)) WHERE rownum<=100;

Example for limit and offset

SELECT * FROM (SELECT /*+ DOMAIN_INDEX_SORT() */ rownum r, t.* FROM table_owner.mytable t 
                    WHERE choral_owner.sample_search(mol, 'CCC','SIMILARITY',1)>=0.9 
                    ORDER BY choral_owner.sample_relevance(1)) 
    WHERE r BETWEEN 101 AND 200 AND rownum <= 100;

Example for selecting similarity value as well

SELECT mol,choral_owner.sample_search(mol, 'CCC','SIMILARITY') FROM table_owner.mytable
    WHERE choral_owner.sample_search(mol, 'CCC','SIMILARITY')>=0.9;

Reaction search

The following search types are supported not only for molecules but for reactions as well.

substructure search
superstructure search
full fragment search
duplicate search

Reaction specific query features - like different positions of the reaction arrow - are taken into account. See examples here in Table 2.

Tautomer search

If you want to execute tautomer search in column storing the molecules, the molecule type of the column must have one of the following tautomer = tautomer mode.

tautomer = GENERIC tautomer mode

How generic tautomer search executes the query and target matching?

in duplicate and full fragment search the generic tautomer - representing all theoretically possible tautomers - of the query and the generic tautomer of the target is compared
in substructure search the query itself is matched with the generic tautomer of the target
in superstructure search the target itself is matched with the generic tautomer of the query

tautomer = CANONIC_GENERIC_HYBRID tautomer mode (deprecated in version 23.12)
tautomer = NORMAL_CANONIC_GENERIC_HYBRID tautomer mode (available from version 23.12)

It is a hybrid tautomer search mode. The query structure is compared to the generic tautomer of target at substructure search, while normal canonical tautomers are compared at duplicate search. In full fragment search from version 20.12 to 20.14 the generic tautomer of the target is used, while from version 20.15 normal canonical tautomers are compared.

tautomer = NORMAL_CANONIC_NORMAL_GENERIC_HYBRID tautomer mode (available from version 23.12)

It is a hybrid tautomer search mode. The query structure is compared to the normal generic tautomer of target at substructure search, while normal canonical tautomers are compared at duplicate and full fragment search. The normal generic tautomer represents a chemically more feasible set of tautomers than the generic tautomer which represents the combinatorically possible tautomers.
See details and differences between tautomers here.

Limitations:

SMARTS atoms and SMARTS bonds in the query structures are not supported.
The use of bond lists in query structures may slow down the search.
From version 22.21.0 the normal canonical tautomer form only for structures having max 100 heavyatoms is taken into account in tautomer search, by default. The limit value is configurable

NOT (substructure, duplicate, etc.) search conditions

There are no specific operators for NOT (substructure, duplicate, etc.) search conditions. The SQL language offers some techniques to implement it, and be careful, because in many cases the performance impact can be huge.

using NOT keyword: simple to write and maintain, but it doesn't use the index, so it usually doesn't perform well, and only advised if there are some other, very selective conditions in the query:

SELECT * FROM <table_owner>.<table_name> 
    WHERE NOT (<choral_owner>.<type>_search(<structure_column_name>, <query_structure>, <search_type>)=1);

using MINUS keyword: more difficult to write and maintain, but it uses the index, so it usually performs well, and is advised unless there are other, very selective conditions in the query:

SELECT * FROM <table_owner>.<table_name>
MINUS
SELECT * FROM <table_owner>.<table_name> 
    WHERE <choral_owner>.<type>_search(<structure_column_name>, <query_structure>, <search_type>)=1;

Interpreting a molecule string using a specific format

As molecule strings are ambiguous in some cases, it is possible to interpret the given molecule string according to the given molecule format by the following way:

Example

'CCC{smiles}'
'CCC{smarts}'

In the query structures, the molecule strings which can be interpreted ambiguously like 'CCC' are - by default - handled as SMILES.

Chemical terms

Function chemterm makes possible to calculate chemical terms.

chemterm('chemical_term', 'structure')

For calculating chemical terms that can be longer than 4000 characters, there is function chemterm_clob provided. Available from version 22.15.0

chemterm_clob('chemical_term', 'structure')

Example

SELECT choral_owner.chemterm('mass','CCC') FROM DUAL;

SELECT choral_owner.chemterm('formula("formulafrompseudo:true")', 'C* |$;Pol_p$|') FROM DUAL;

Add chemical term column to a table

Example

ALTER TABLE mytable ADD (mass number);
UPDATE TABLE mytable SET mass=choral_owner.chemterm('mass', mol)

Calculating chemical terms on the standardized structure

Chemical term is calculated on the input structure without standardization. Combine with the standardize method to calculate chemical terms on the standardized structure:

Standardize the structure before inserting into the database columns:

INSERT INTO <your_table> (mol, ... ) VALUES (sample_standardize(<your structure>), ... )

Store both the not standardized and the standardized structures:

INSERT INTO <your_table> (mol,standardized_mol ... ) VALUES (<your structure>, choral_owner.sample_standardize(<your structure>, ... )

Standardize on the fly before calling chemical terms:

select choral_owner.chemterm(<your_chemical_terms>, choral_owner.sample_standardize(<your_structure>));

Structure checkers and fixers

The chemterm function makes possible to execute structure checker and fixer operations.

SELECT chemterm('check(''checkerAction’')','molecule') FROM DUAL;
SELECT chemterm('fix(''checkerAction->fixerAction'')','molecule') FROM DUAL;

Example:

SELECT chemterm('check(''explicith..doublebondstereoerror'')','[H]C(C)C=C |w:3.2|') FROM DUAL;
SELECT chemterm('fix(''missingatommap->mapmolecule'')','CCO') FROM DUAL;

See examples in Chemical Terms/Structure checkers documentation page. The available checkers and and their respective fixers are found here.

See the use of custom structure checkers and fixers described for JChem PostgreSQL Cartridge. The only difference for JChem Choral is the configuration file to be used in place of /etc/default/jchem-psql.

Set -Dchemaxon.structurechecker.factory.ExternalCheckerFixerConfiguration=/<dir>/externalcheckersfixers.xmlvalue in [install dir]/run-choral.vmoptions and [install dir]/choral-service.vmoptions files.

Standardize

(Available since version 19.19.)

Calculating the standardized structure

The standardization steps are determined by the molecule type definition. The returned value is a the standardized structure in MRV format

<choral_owner>.<type>_standardize('molecule')

where

molecule = a Molecule string

<type> can be one of the type file names present in /data/types/ when the service was initialized

Example:

SELECT choral_owner.sample_standardize('C1=CC=CC=C1');

Molconvert

Conversion to molecule formats

The use of Chemaxon's MolConverter is supported with some limitations:

molconvert('structure','format')

where

structure = a Molecule in any of the following formats

format = mrv, mol, rgf, sdf, rdf, csmol, csrgf, cssdf, csrdf, cml, smiles, cxsmiles, abbrevgroup, sybyl, mol2, pdb, xyz, inchi, or name; the output options of the file formats can also be set

Example

SELECT choral_owner.molconvert('CC', 'mrv') FROM DUAL;
SELECT choral_owner.molconvert('CC', 'mrv:H') FROM DUAL

Conversion to base64 encoded binary formats (image)

Molecules can be converted to binary image formats (png, jpeg, msbmp, pov, svg, emf, tiff, eps) or other binary formats (pdf) in Base64 encoded form.

Example

SELECT choral_owner.molconvert('CC','base64:png') FROM DUAL;

Highlight

Highlight function for comparing two structures

Available from version 21.3.0.

The highlight function compares a query structure with a target structure and highlights the bonds and atoms of the target structure matching with the query structure.

<type>_highlight('target_structure','query_structure')
---- works with default alignment_mode and default color

<type>_highlight('target_structure','query_structure','alignment_mode','color')

<type>_highlight('target_structure','query_structure','alignment_mode','color','search_option')

<type>_highlight('target_structure','query_structure','alignment_mode','color','search_option','format')
---- available from version 21.4.0

Examples:

SELECT sample_highlight('CCOCC','COC') from dual;  --- works with alignment_mode OFF and default color blue

SELECT sample_highlight('CCOCC','COC','rotate','green') from dual;

SELECT sample_highlight('CCOCC','COC','rotate','green','DBSMARKEDONLY') from dual;

SELECT sample_highlight('CCOCC','COC','rotate','green','DBSMARKEDONLY','base64:png') from dual; 
SELECT sample_highlight('CCOCC','COC','rotate','green','DBSMARKEDONLY','mrv') from dual;

Three alignment modes can be applied:

off [default]

The hit structure's position is the same as that of the target structure.
rotate

The hit structure is rotated till its part corresponding to the query gets the same position as the query structure has.
partial_clean

The hit structure's position is partially aligned to the query structure.

The expected color of the highlighted atoms and bonds can also be set.

Available colors: blue [default], black, cyan, dark_gray, gray, green, light_gray, magenta, orange, pink, red, white, yellow

Search_option can be DBSMARKEDONLY [default] and/or IGNORETETRAHEDRALSTEREO

If there is no hit, the output contains the target structure without coloring.

The format can be any format listed above at Molconvert function including the base64 encoded image formats. Default format is MRV.

Hit_highlight operator for highlighting hits of database search

Available from version 21.4.0.

See examples here.

isvalidmolecule

The function isvalidmolecule defines whether a given chemiical structure string represents a valid molecule or not.

isvalidmolecule('structure')

where

molecule = a Molecule string or a CLOB type column name storing chemical structures

The output is

1 : valid molecule

0 : invalid molecule

Example

SELECT choral_owner.isvalidmolecule('CC') FROM DUAL;

Cost estimation

Cost estimation is automatically set by the installer.

If you want to switch it off, run the sql file found in choral/sql/ folder.:

choral_disassoc_stats.sql

Performance Tuning

By default the index creation process running at the JChem application server is multithreaded, trying to exploit the available processor resources on the server. This is true for index mode searches as well to provide maximum performance.

Below we gathered some steps that can improve the speed of select statements containing chemical search.

Choral service location

The Choral service performing the chemical searches can be installed on the same server as the Oracle server or can be installed on a separate server. If it is installed on a separate server then actions involving heavy network traffic can be affected significantly.

Some examples:

Chemical index creation
Substructure searches resulting very large amount of hits, e.g.
```
SELECT COUNT(*) FROM my_big_table WHERE sample_search(mol, 'c1ccccc1', 'SUBSTRUCTURE')=1;
```
Benzene ring is very frequent in organic structures thus in a table with several million structures there can be several million hits.
Substructure searches requiring functional execution, e.g.
```
SELECT COUNT(*) FROM my_big_table WHERE sample_search(mol, 'c1ccccc1', 'SUBSTRUCTURE')=1 AND id < 100;
```
In this example it is better to retrieve the first hundred IDs and check the substructure condition for each of them separately. But this will result nearly hundred calls to the service, which will be slower if it is on a separate server.

However, if the Oracle Server is heavily used for other than chemical search then it may be better to put the service on a separate machine with a good network connection because high CPU usage of the Oracle Service can slow down the highly parallelized chemical search.

Memory setup

Chemical search can reach its optimal performance if all data required for search are located in the service cache. However, it is possible that there is not enough memory available on the server to store everything in the memory, in this case we would advise to use the low memory setup, which favors storing fingerprints in the cache to storing structural representation of molecules. Fingerprints are needed in each search type, for each chemical search, having the structures available in the memory is usually required only for searches with many hits. There is a memory calculator available, which can be used to get the optimal environment set up in your case.

Notices to memory calculator

Type files to modify are located in folder data/types/ and not in the folder specified by the calculator page.

The calculated Xmx parameter must be copied into files choral-service.vmoptions and run-choral.vmoptions (not into the file specified on the calculator page).

All other properties calculated by the Xmx calculator must be copied into the configuration file config/choral.conf (not into the file specified on the calculator page).

The infix '.runtime' must be omitted from keys presented on the calculator page as the example below shows.

com.chemaxon.jchem.psql.label.cachePolicy=LRU
com.chemaxon.jchem.psql.label.cachedObjectCount=1000000
com.chemaxon.jchem.psql.molecule.cachePolicy=LRU
com.chemaxon.jchem.psql.molecule.cachedObjectCount=1000000
com.chemaxon.jchem.psql.fingerprint.cachedObjectCount=1320000

Gather statistics

Cost estimation of chemical searches is automatically set up at the installation. However, statistics should be gathered for the table that we are going to search to help Oracle optimizer:

call dbms_stats.gather_table_stats(ownname => '<PLAIN USER>', tabname => 'TABLE_NAME', estimate_percent => 100);

Alternatively, statistics can be gathered for the whole schema:

call dbms_stats.gather_schema_stats(ownname => '<PLAIN USER>', estimate_percent => 100);

Limit the size of the result set

Most of the chemical searches will be really fast if the above two steps are taken, it should take no more than some milliseconds for several million structures. But if the number of hits is very high then it can take longer. In this case it is worth limiting the number of expected results because hits are returned in the order of their relevance and thus the most relevant hits will be returned anyway. In a real time environment the user will not be able to process hundreds of thousands of hits, thus limiting it in order to improve performance is beneficial.

Example of retrieving the hundred most relevant substructure search results:

SELECT * FROM (SELECT /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable 
    WHERE choral_owner.sample_search(mol, 'CCC','SUBSTRUCTURE',1)=1 
    ORDER BY choral_owner.sample_relevance(1)) WHERE rownum<=100;

Adding hints

The speed of the searches can reach its full potential by adding hints to the searches. The hint DOMAIN_INDEX_SORT() is critical to set:

DOMAIN_INDEX_SORT()

Important at chemical searches if relevance ordering is used. Doesn’t affect the search speed otherwise.

Technical background: informs the planner that the search results are already given in the order of relevance so there is no need to fetch all the values and sort it afterwards.

The speed of the searches can also be increased by switching DYNAMIC_SAMPLING off.

Dynamic sampling can be switched off by adding the relevant hint to the SQL query:

SELECT /*+ DOMAIN_INDEX_SORT() DYNAMIC_SAMPLING(mytable 0) */ * FROM table_owner.mytable where choral_user.sample_search(mol, ’CCC’,’SUBSTRUCTURE’,1)=1 ORDER BY choral_user.sample_relevance(1);

or by switching it off before running the SQL query:

alter session set optimizer_dynamic_sampling=0;
...
SELECT /*+ DOMAIN_INDEX_SORT() */ * FROM table_owner.mytable where choral_user.sample_search(mol, ’CCC’,’SUBSTRUCTURE’,1)=1 ORDER BY choral_user.sample_relevance(1);

Avoid planning for only chemical search

Cost estimation by the Oracle Query Planner may cost much time compared to the whole search time in some cases of substructure search without other filter conditions, especially in the limited search case. In this case when only substructure search is performed in the select statement it is always better to use the domain index to perform the search then to do a full table scan. Thus, query planning is unnecessary. If a hint is given to the optimizer to use the index cost estimation will still be performed but it can be avoided with a workaround.

Here is a code example to avoid planning by introducing a dummy function and wrapping the query structure into calling this dummy function:

CREATE OR REPLACE FUNCTION avoid_planning(v varchar2) RETURN varchar2 IS
BEGIN
  RETURN v;
END;
/

SELECT * FROM test WHERE sample_search(mol,avoid_planning('C'),'SUBSTRUCTURE') = 1;

Planning with CLOB queries

Unfortunately the cost estimation of the searches are not calculated if the query is a value in CLOB format. This could make the database use a worse plan with worse performance. This behavior is the consequence of the Oracle database logic which doesn't treat CLOB values as constants. E. g. no cost estimation here, if the :query parameter is a CLOB:

SELECT * FROM test WHERE sample_search(mol,:query,'SUBSTRUCTURE') = 1;

We have two workarounds to do this:

The simplest way is to set the query in a shorter molecule format if possible, that fits into the 4000 character length limit of VARCHAR2. In this case you can set the query is VARCHAR2 therefore the cost estimation is calculated.
If the query is a column of a table and the table contains exactly one row, then the cost is calculated even if the query is in CLOB format. Here is an example how to do that:

a. create a temporary table to store the query structure:
```
CREATE GLOBAL TEMPORARY query_table(query clob);
```
b. Execute the following statements at every query:

b1. Fill up the query table with the only one row containing the query structure:
```
DELETE FROM query_table;
INSERT INTO query_table values (<your query>);
```
b2. Force the database to re-plan the final query. It might be unnecessary and there are many ways to force that depending on the setup of the database. One option is:
```
exec dbms_stats.delete_index_stats(<index_schema_name>, <index_name>);
```
b3.Execute the query by joining the query table:
```
SELECT * FROM test,query_table WHERE sample_search(mol,query,'SUBSTRUCTURE') = 1;
```

Tuning cost factors for Oracle Query Planner

Choral cartridge provides functions to the Oracle Query Planner to enable estimating the cost of the chemical searches. Currently cost estimation can not be calibrated automatically to the given environment but the factors influencing its behavior can be changed manually.

There are six values in the CHORAL_SETTINGS table of the Choral owner. By updating these values the costs of chemical searches can be influenced.

INDEX_SCREEN_FACTOR

Factor for the cost of fingerprint screening performed using the Choral domain index. Increasing it mildly increases the cost of index usage for substructure and duplicate search.

INDEX_ABAS_FACTOR

Factor for the cost of atom by atom search performed using the Choral domain index. Increasing it significantly increases the cost of index usage for substructure search queries with many hits.

INDEX_SIMILARITY_FACTOR

Factor for the cost of similarity search performed using the Choral domain index. Increasing it increases the cost of index usage for similarity search.

FUNCTION_SCREEN_FACTOR

FUNCTION_ABAS_FACTOR

FUNCTION_SIMILARITY_FACTOR

These are the same factors respectively for functional execution.

These values can be updated to fit the current environment for example by measuring time required for the same query by executing it with index usage or functional execution and setting cost factor values to switch from one to the other at the right moment.

For example, there is a given value for the limit on the ID below where functional and index execution take roughly the same time:

SELECT COUNT(*) FROM my_big_table WHERE sample_search(mol, 'c1ccccc1Cl', 'SUBSTRUCTURE')=1 AND id<=<limit for id>;

To find this ID value, one can run the same query with hint for index or functional execution with different ID values and thus get to the value where their time match:

SELECT /*+ INDEX(my_big_table my_big_table_chemical_index) */ COUNT(*) FROM my_big_table 
    WHERE sample_search(mol, 'c1ccccc1Cl', 'SUBSTRUCTURE')=1 AND id<=<limit for id>;

SELECT /*+ NO_INDEX(my_big_table my_big_table_chemical_index) */ COUNT(*) FROM my_big_table 
    WHERE sample_search(mol, 'c1ccccc1Cl', 'SUBSTRUCTURE')=1 AND id<=<limit for id>;

Let us say that this value is N. When this value is found then the above described factors can be increased or decreased so that Oracle Query Planner would change the execution plan for the query

SELECT COUNT(*) FROM my_big_table WHERE sample_search(mol, 'c1ccccc1Cl', 'SUBSTRUCTURE')=1 AND id<=<limit for id>;

around the value N. The execution plan chosen by the planner can be checked by executing

EXPLAIN PLAN FOR SELECT COUNT(*) FROM my_big_table WHERE sample_search(mol, 'c1ccccc1Cl', 'SUBSTRUCTURE')=1 AND id<=<limit for id>;
SELECT * FROM table(dbms_xplan.DISPLAY);