Skip to content

Second Generation Search Engine

This document describes the main features, configuration possibilities and functional features of the second generation search engine implemented in the following products.

Contents

New features

Relevance ordering

The hits of substructure search are given back ordered by the relevance (similarity) between the hit structure and the query structure.

Hit as you draw

The most relevant hit structures are given back almost simultaneously with the modification of the query structure.

Configuration

Technical parameters

The Xmx size, cache sizes, and further parameters need to be set before running the servers.
Here is a helping page provided for the calculation of the approximate configuration parameters.

JChem Engines cache and memory calculator

Business rules

The business rules relating the interpretation mode of the chemical structures are defined in molecule types.
These rules cover the followings:

  • standardizer actions to be executed on the structures
  • tautomer search mode
  • assumption of stereo interpretation mode

Molecule type

The molecule types must be set before initializing the new servers.

Where to define molecule type(s)
JChem Choral /data/tapes/ files
JChem PostgreSQl Cartridge /etc/chemaxon/types/ files
JChem Microservices DB jws-config/common-config/application.properties file or
jws-db/config/application.properties file
Application mode of molecule type(s)
JChem Choral as index type and as search type
JChem PostgreSQl Cartridge as column type
JChem Microservices DB as table property
Standardizer rules

The standardizer actions can be defined in two forms as

  • standardizer action string
  • standardizer file
Tautomer mode

For the detailed description of

  • all tautomers which we call generic tautomer,
  • normal canonical tautomers,
  • normal all tautomers which we call normal generic tautomer,

see the Tautomerization and tautomer models of Chemaxon documentation.

There are four tautomer search modes provided

  • OFF
    Tautomers are not taken into account during the search
  • GENERIC

    How generic tautomer search executes the query and target matching?

    • in duplicate and full fragment search the generic tautomer - representing all theoretically possible tautomers - of the query and the generic tautomer of the target is compared
    • in substructure search the query itself is matched with the generic tautomer of the target
    • in superstructure search the target itself is matched with the generic tautomer of the query

    How are the stereo features handled within the tautomer region represented by the generic tautomer?

    • In versions prior to 25.1.0, the stereo features are protected within the tautomer region
    • From version 25.1.0, the stereo features are not protected within the tautomer region
Query Target Matching in versions prior to 25.1.0 Matching in versions from 25.1.0
images/download/attachments/1806759/taut11.png images/download/attachments/1806759/taut14.png images/download/attachments/1806759/no.png images/download/attachments/1806759/yes.png
images/download/attachments/1806759/taut12.png images/download/attachments/1806759/taut13.png images/download/attachments/1806759/no.png images/download/attachments/1806759/yes.png
1
  • CANONIC_GENERIC_HYBRID (deprecated in 23.12), NORMAL_CANONIC_GENERIC_HYBRID (from version 23.12)

    It is a hybrid tautomer search mode. The query structure is compared to the generic tautomer of target at substructure search, while the normal canonical tautomers are compared at duplicate search. In full fragment search from version 20.12 to 20.14 the generic tautomer of the target is used, while from version 20.15 normal canonical tautomers are compared.

  • NORMAL_CANONIC_NORMAL_GENERIC_HYBRID (from version 23.12)

    It is a hybrid tautomer search mode. The query structure is compared to the normal generic tautomer of target at substructure search, while the normal canonical tautomers are compared at duplicate and full fragment search.

Query Target Tautomer mode
OFF
Tautomer mode
GENERIC
Tautomer mode
NORMAL_CANONIC_GENERIC_HYBRID

​ See substructure search examples below:

Query Target Tautomer mode
OFF
Tautomer mode
GENERIC
Tautomer mode
NORMAL_CANONIC_GENERIC_HYBRID
Tautomer mode
NORMAL_CANONIC_NORMAL_GENERIC_HYBRID
Stereo assumption

By default, all stereo molecules - independently of the presence or absence of the chiral flag - are regarded as molecules with absolute stereo configuration.

If you want exclusively molecules with chiral to be handled as absolute (and molecules without chiral flag to be handled as relative) you must set stereoAssumption = RELATIVE in the molecule type definition.

Query Target Stereo assumption
ABSOLUTE
Stereo assumption
RELATIVE

Functional features

Search options

option name
JChem Choral ignoretetrahedralstereo
ignorecharge
ignoreisotope
JChem PostgreSQl Cartridge ignoretetrahedralstereo
ignorecharge
ignoreisotope
JChem Microservices DB stereoSearchIgnoreTetrahedralStereo
ignoreCharge
ignoreIsotope

By default, the specified tetrahedral stereo configuration, charge and isotope value is required to match in the hit structures.

From version 24.1.0 ignoretetrahedralstereo, ignorecharge and ignoreisotope options can also be applied - beside SUBSTRUCTURE and FULLFRAGMENT search - in DUPLICATE search. The specified tetrahedral stereo, charge and isotope properties will be ignored during the search not only on the query structures - as until version 23.17.0 - but on the target structures as well.

In order to ignore the tetrahedral stereo configuration specified in the query structures and in the target structures, the ignoretetrahedralstereo option can be used.

Query Target without ignoretetrahedralstereo
[Default]
with ignoretetrahedralstereo
images/download/attachments/1806734/no.png
images/download/attachments/1806734/no.png

Stereo search on marked double bond only

By default, the double bond stereo configuration of all the double bonds of the hit structures must be the same as that of the query structures. See first examples below.

The dbsmarkedonly search option makes possible to check the E/Z configuration of only those double bonds that are marked.

Query Target without dbsmarkedonly
[Default]
with dbsmarkedonly
option name
JChem Choral dbsmarkedonly
JChem PostgreSQl Cartridge dbsmarkedonly
JChem Microservices DB stereoSearchOnMarkedDoubleBondOnly

Axial stereo information

In the products based on the second generation search engine, the axial stereo information is taken into account in duplicate search, while in the products JChem Base and JChem Oracle Cartridge - using the first generation search engine - the axial stereo information is not taken into account, by default.

Duplicate search results

Query Target Second generation search engine First generation search engine [Default] First generation search engine with ignoreAxialStereo=false

Homology groups in query structures

The handling of homology groups in query structures has been changed at version 24.3.0.
At versions prior to 24.3.0, the homology groups are not translated, that is - for example - an ethyl group in the target is not matched to an alkyl homology group in the query.
From version 24.3.0 they are translated resulting relevant hits in substructure and fullfragment searches.

Example

Query Target Before version 24.3.0 From version 24.3.0

In products JChem Oracle Cartridge (JOC) and JChem Base - using the first generation search engine - it is possible to modify the strictness of duplicate search by different search options for ignoring the check of different features, like tetrahedral stereo, double bond stereo, charge, isotope, ...

JOC example

SELECT * FROM test where jc_compare(mol, 'C\C=C\C', 't:d ignoreDoubleBondStereo:y') = 1;
Target Query JOC/JCB
duplicate search with ignoreDoubleBondStereo
JPC/Choral
duplicate search with dbsmarkedonly
N/A
N/A

The second generation search engine products from version 24.1.0 - also provide options ignoreTetrahedralStereo, ignoreCharge, ignoreIsotope for duplicate search for loosing its strictness.
For loosing the strictness relating the double bond stereo configuration, the recommended solution is applying full fragment search in place of duplicate search together with restricting the fragment count of the hits to the fragment count of the query structure.
In full fragment search dbsmarkedonly option has to applied:

JPC example

SELECT * FROM test WHERE query_transform('C\C=C\C', 'dbsmarkedonly') |<=| mol AND chemterm('fragmentCount', mol)::smallint = 1;

Choral example

SELECT * FROM test WHERE sample_Search (mol,'C\C=C\C','FULLFRAGMENT', 'DBSMARKEDONLY')=1 AND chemterm('fragmentCount', mol)=1;

We have to mention that this solution is not exactly the same as duplicate search with ignoreDoubleBondStereo option in JOC, but mimics its behavior.

Duplicate search checks all features of the two structures and matches them.
With the ignoreDoubleBondStereo option turned on, the double bond stereo feature is ignored.
On the other hand we mimic this behavior with full fragment search which can be considered as a restricted substructure search with restriction to heavy atom count.
(The matching fragment heavy atom count should be equal with the query heavy atom count.)
But since it is a substructure search, the matching of atom and bond properties are not exact. E.g. a non charged query atom can match to a charged target atom.
So in this case you may find charged atoms among the targets like in the following example:

Target Query JOC/JCB
duplicate search with ignoreDoubleBondStereo
JPC/Choral
full fragment search with dbsmarkedonly

Hit highlight

The highlight function compares a query structure with a target structure and highlights the bonds and atoms of the target structure matching with the query structure. The alignment mode and the color applied for highlighting can be set.
Three alignment modes are available:

  • off

    The hit structure's position on the screen is the same as that of the target structure.

  • rotate

    The hit structure is rotated till its part corresponding to the query gets the same position as the query structure has.

  • partial clean

    The hit structure's position on the screen is partially aligned to the query structure.

Query Target Alignment
off
Alignment
rotate
Alignment
partial clean
option name
JChem Choral function highlight
operator hit_highlight
JChem PostgreSQl Cartridge function highlight
JChem Microservices DB /rest-v1/db/highlight