DB Web Services

{info} DB Web Services provide methods for storing and searching chemical structures in a persistent database, currently in H2 and PostgreSQL databases. There are methods provided for:

creating / deleting tables,

inserting / deleting / modifying structures and data in the tables,

executing duplicate, substructure, full fragment and similarity searches.

This documentation describes installation, administration and usage of DB Web Services.

DB Web Services application is available in two modes:

As part of a microservices system
As standalone web application

Microservices system mode

In microservices system mode, the DB Web Services runs together with the Config, Discovery and Gateway services. These three services are mandatory, and optionally other services can also be part of the system. All configuration must be done in the Config service.

The default configuration applies to the microservices system mode.

The web application runs on host <server-host> and listens on port <gateway-server-port>.

Standalone web application mode

In standalone web application mode, the DB Web Services runs alone, without the Config, Discovery and Gateway services (however, the installer installs them as well).

The default configuration must be changed according to the standalone web application mode; set

eureka.client.enabled=false in application.properties file, and

up to version 22.2.0 spring.cloud.config.failFast=false ,

spring.cloud.config.enabled=false and

spring.cloud.config.uri= (so to empty) in bootstrap.properties file

from version 22.9.0

set spring.cloud.config.enabled=false and comment out line spring.config.import= in application.properties file.

All configuration must be done in the DB module.

The web application runs on host <server-host> and listens on port <server-port>.

Download

See here.

System requirements

See here

Installation

See here.

Module is installed into folder: jws/jws-db

Licenses

See here.

Logging

See here.

Configuration

Default configuration:

application.properties	description
server.port=8062
logging.file.name=../logs/jws-db.log
spring.config.import=configserver:${CONFIG_SERVER_URI:http\://localhost\:8888}?fail-fast=true&max-attempts=100&max-interval=60000&multiplier=1.2&initial-interval=3000	Added in version 22.6.0.
eureka.client.enabled=true	set `eureka.client.enabled=false` to switch to standalone DB Web Services application mode
initOnStart=AUTO	initOnStart can be: `INIT`: the existing database is deleted, and a new empty one is created `AUTO`: existing database is started, in case of non-existing database a new empty one is created `OPEN`: existing database is started, in case of non-existing database error is thrown `OPEN_OR_INIT_ONLY`: if `OPEN` is unsuccessful, then only an `INIT` step happens, however the service is not run
updateMode=EXIT	updateMode gets in action only if the version number has changed updateMode can be `EXIT`: exit if version mode has changed `DROP`: drop old data if version mode has changed `REINDEX`: keep old data and reindex them in order to work with new version `FORCE_REINDEX`: keep old data and reindex them regardless of version change
search.wallTimeLimitSeconds=3600
com.chemaxon.zetor.settings.scheme=${CXN_SCHEME:GCRDB}	Possible values: GCRDB - only 1 DB instance can run CRDB - used for HA solution, multiple DB instances can run. Hazelcast provides shared cache for instances.
com.chemaxon.zetor.settings.indexDir=${CXN_STRUCTURE_DATA_DIR:./data/chemical-data/store}	Stores database files - used only in case of H2 database!
com.chemaxon.zetor.settings.gcrdb.isSingleTable=${CXN_DB_LOGIC_SINGLE_TABLE:true} com.chemaxon.zetor.settings.gcrdb.singleTableName=${CXN_DB_TABLE_NAME:engine_data} com.chemaxon.zetor.settings.gcrdb.sqlBuilderProvider=${CXN_DB_DIALECT:H2} com.chemaxon.zetor.settings.gcrdb.jdbcUrl=${CXN_DB_JDBC_URL:jdbc:h2:nio:${com.chemaxon.zetor.settings.indexDir}/db;COMPRESS=true} com.chemaxon.zetor.settings.gcrdb.user=${CXN_DB_JDBC_USER:user} com.chemaxon.zetor.settings.gcrdb.password=${CXN_DB_JDBC_PASSWORD:password} com.chemaxon.zetor.settings.gcrdb.allowBatchUpdates=${CXN_DB_LOGIC_BATCH_UPDATE:true} com.chemaxon.zetor.settings.forcePurge=true
com.chemaxon.webservices.db.import_export.dir=${CXN_DB_IMPORT_EXPORT_DIR:data/export} com.chemaxon.webservices.db.import_export.importBatchSize=${CXN_DB_IMPORT_EXPORT_BATCH_SIZE:5000}
com.chemaxon.zetor.types[0].version = 1 com.chemaxon.zetor.types[0].typeName = sample com.chemaxon.zetor.types[0].typeId = 1 com.chemaxon.zetor.types[0].tautomerHandlingMode = OFF com.chemaxon.zetor.types[0].stereoAssumption=ABSOLUTE com.chemaxon.zetor.types[0].standardizerConfig = aromatize com.chemaxon.zetor.types[1].version = 1 com.chemaxon.zetor.types[1].typeName = taumol com.chemaxon.zetor.types[1].typeId = 2 com.chemaxon.zetor.types[1].tautomerHandlingMode = GENERIC com.chemaxon.zetor.types[1].stereoAssumption=ABSOLUTE com.chemaxon.zetor.types[1].standardizerConfig = aromatize	Here the molecule types are defined. You can delete, modify or add new molecules types. Important: Indexing of the types array must always be sequential from 0: 0, 1, 2, ... `version` must be 1 `typeName` must be unique `typeID` must be unique integer `tautomerHandlingMode` can be `OFF` `GENERIC` `CANONIC_GENERIC_HYBRID` (available from version 20.10, deprecated from version 23.12) `NORMAL_CANONIC_GENERIC_HYBRID` (available from version 23.12) `NORMAL_CANONIC_NORMAL_GENERIC_HYBRID` (available from version 23.12) `stereoAssumption` can be `ABSOLUTE` or `RELATIVE` (available from version 20.12) `standardizerConfig` can be made of action strings `standardizerConfig=aromatize:b..removeExplicitH` From version 21.19 the standardizer configuration can be specified also as `com.chemaxon.zetor.types[n].standardizerFile` but strictly use only one of them, `standardizerConfig` OR `standardizerFile` All changes take effect only if `initOnStart` is set to `INIT`, and the application is re-started. Take care, the existing database will be deleted. From version 22.21.0 parameter com.chemaxon.zetor.types[n].canonicalTautomerHeavyAtomLimit (default value 100) is available limiting the size of molecules the tautomer form of which is taken into account in duplicate and fullfragment search in the case of CANONIC_GENERIC_HYBRID tautomerHandlingMode.

Performance tuning can be executed by doing some cache related settings. Please use our JChem Engines cache and memory calculator page (take into account that superstructure search is not available) to calculate the appropriate settings.

All properties calculated by the calculator must be copied into the application.properties file (not into the file specified on the calculator page) like:

com.chemaxon.zetor.settings.label.cachePolicy=DISABLED
com.chemaxon.zetor.settings.label.cachedObjectCount=0
com.chemaxon.zetor.settings.molecule.cachePolicy=DISABLED
com.chemaxon.zetor.settings.molecule.cachedObjectCount=0
com.chemaxon.zetor.settings.fingerprint.cachedObjectCount=1320000

The infix *.runtime* in keys was used in version 19.10 and earlier:

com.chemaxon.zetor.settings.runtime.label.cachePolicy=DISABLED
com.chemaxon.zetor.settings.runtime.label.cachedObjectCount=0
com.chemaxon.zetor.settings.runtime.molecule.cachePolicy=DISABLED
com.chemaxon.zetor.settings.runtime.molecule.cachedObjectCount=0
com.chemaxon.zetor.settings.runtime.fingerprint.cachedObjectCount=1320000

bootstrap.properties
spring.cloud.config.failFast=true #Removed in version 22.6.0. spring.cloud.config.uri=${CONFIG_SERVER_URI:http\://localhost\:8888/} #Removed in version 22.6.0. spring.cloud.config.retry.initialInterval=3000 #Removed in version 22.6.0. spring.cloud.config.retry.multiplier=1.2 #Removed in version 22.6.0. spring.cloud.config.retry.maxInterval=60000 #Removed in version 22.6.0. spring.cloud.config.retry.maxAttempts=100 #Removed in version 22.6.0.

bootstrap.properties

spring.cloud.config.failFast=true #Removed in version 22.6.0.
spring.cloud.config.uri=${CONFIG_SERVER_URI:http\://localhost\:8888/} #Removed in version 22.6.0.
spring.cloud.config.retry.initialInterval=3000 #Removed in version 22.6.0.
spring.cloud.config.retry.multiplier=1.2 #Removed in version 22.6.0.
spring.cloud.config.retry.maxInterval=60000 #Removed in version 22.6.0.
spring.cloud.config.retry.maxAttempts=100 #Removed in version 22.6.0.

For more configuration options, see the Spring documentation page.

Search logging

Debug level of search logging can be set in the files configuring the JVM options:

jws-db-service.vmoptions run-jws-db.vmoptions

by adding line

-Djchem.debug=true

or setting

logging.level.com.chemaxon.jchem=DEBUG

in application.properties file.

High Availability (HA)

HA and load balancing is provided for DB Web Services, for the only stateful web service in JChem Microservices.

Running more instances of the db service ensures HA and load balancing.

In HA mode, Hazelcast is used for distributed caching of the data.

It can be optionally set that all nodes present in the system have their own cache ('Near' cache). By default, near cache is switched on.

Features when near cache is switched on:

quicker structure search
increased memory usage
in the case of data update, the system eventually becomes inconsistent; it can take a few seconds to get back again consistent.

Requirements

HA mode needs PostgreSQL database installed, user and database created

Configure HA mode

HA mode must be configured in the JWS Config module: /jws-config/common-config/application.properties.
com.chemaxon.zetor.settings.scheme must be set to CRDB!

Example configuration:

/jws-config/common-config/application.properties
com.chemaxon.zetor.settings.indexDir=data/chemical-data/store com.chemaxon.zetor.settings.scheme=crdb com.chemaxon.zetor.settings.forcePurge=true com.chemaxon.zetor.settings.crdb.sqlBuilderProvider=POSTGRESQL com.chemaxon.zetor.settings.crdb.jdbcUrl=jdbc:postgresql://localhost:5432/zetor com.chemaxon.zetor.settings.crdb.user=chemaxon com.chemaxon.zetor.settings.crdb.password=chemaxon com.chemaxon.zetor.additional.scheme=crdb com.chemaxon.zetor.additional.indexDir=data/extra-data/ com.chemaxon.zetor.additional.crdb.sqlBuilderProvider=POSTGRESQL com.chemaxon.zetor.additional.crdb.jdbcUrl=jdbc:postgresql://localhost:5432/zetor com.chemaxon.zetor.additional.crdb.user=chemaxon com.chemaxon.zetor.additional.crdb.password=chemaxon

/jws-config/common-config/application.properties

com.chemaxon.zetor.settings.indexDir=data/chemical-data/store
com.chemaxon.zetor.settings.scheme=crdb
com.chemaxon.zetor.settings.forcePurge=true
com.chemaxon.zetor.settings.crdb.sqlBuilderProvider=POSTGRESQL
com.chemaxon.zetor.settings.crdb.jdbcUrl=jdbc:postgresql://localhost:5432/zetor
com.chemaxon.zetor.settings.crdb.user=chemaxon
com.chemaxon.zetor.settings.crdb.password=chemaxon
com.chemaxon.zetor.additional.scheme=crdb
com.chemaxon.zetor.additional.indexDir=data/extra-data/
com.chemaxon.zetor.additional.crdb.sqlBuilderProvider=POSTGRESQL
com.chemaxon.zetor.additional.crdb.jdbcUrl=jdbc:postgresql://localhost:5432/zetor
com.chemaxon.zetor.additional.crdb.user=chemaxon
com.chemaxon.zetor.additional.crdb.password=chemaxon

Load balanced example

Here you find a load balanced example application on GitHub.

Running the server

Prerequisites in case of microservices system mode:

Config service is running
Discovery service is running
Gateway service is running

Run the service in command line in folder jws/jws-db/ :

jws-db-service.exe --install
jws-db-service.exe --start (on Windows in administrator's terminal)
jws-db-service start (on Linux)

run-jws-db.exe (on Windows)
run-jws-db (on Linux)

API Documentation

Find and try out the API on the Swagger UI.

Mode	URL of Swagger UI	default URL of Swagger UI
microservices system	<serverhost>:<gateway-port>/jwsdb/API/	localhost:8080/jwsdb/API/
standalone web application mode	<serverhost>:<server-port>/API/	localhost:8062/API/

Demo site

For detailed description check out the JWS DB demo site:

https://jchem-microservices.chemaxon.com/jwsdb/api/index.html

Usage

The guidelines, examples on the Demo site or on the Swagger UI API documentation of your installed module display the methods and syntax implemented for reaching the essential chemical searching functionalities of JChem Base.

Molecule type information

DB Web Services provides method for getting the available molecule types.

Every table has a Molecule type: this is a descriptor that is used by the search engine. It contains information about how structures are handled during search. The application has two very simple built in types called: sample (search with aromatization) and taumol (tautomer search). See the type definitions in the application.properties file.

Store and search molecules and non-chemical data

Table operations

Structure Insert/Delete methods

Duplicate search methods

Substructure search methods

Similarity search methods

Search on additional data

DB Web Services provide additional data filtering option in POST request case on below endpoints. Additional data filtering is executed after chemical filter so search performance is better if query molecule is well-defined and narrow the result set.

/rest-v1/db/additional/{tableName}/fullfragment
/rest-v1/db/additional/{tableName}/similarity
/rest-v1/db/additional/{tableName}/substructure

Possible filtering options in additionalDataCondition attribute

Text

Text additional data can be filtered with following operators.

exact - Filter text is exactly the same as additional data value
contains - Additional data contains the provided filter text
notExact - Filtered additional data type is defined on molecule and has different value than filter text

{
  "field": "name",
  "operator": "contains",
  "value": "acid"
}

Number

Number can be filtered with following operators: >, >=, <, <=, =, !=

{
  "field": "mass",
  "operator": ">=",
  "value": 300
}

Complex

Multiple filters can be combined with and and or logical operators

{
  "operator": "or",
  "conditions": [
    {
      "operator": "and",
      "conditions": [
        {
          "field": "mass",
          "operator": ">",
          "value": 100
        },
        {
          "field": "mass",
          "operator": "<=",
          "value": 200
        }
      ]
    },
    {
      "field": "name",
      "operator": "contains",
      "value": "methane"
    }
  ]
}

{primary} Timeout From version 19.25 the default timeout ( spring.cloud.gateway.httpclient.response-timeout ) is taken into account during indexing. If an indexing job is started already and timeout occurs, then in the background the indexing continues. and the user is informed in the message of the error response about it: "this process has started already will finish in the background" . If the process was only waiting for other indexing processes to finish, then "this process was cancelled" message appears in the error message. The default timeout is 25000 milliseconds. We stop the process slightly before the end of the timeout (default: 300 milliseconds, can be set with zetor.indexing.timeoutDifferenceInMilliseconds property) so we have a chance to provide the answer correctly.

Data recovery

The data apparently stored in tables (collections) are stored in the file system under jws-db/data/ folder. The content of that folder must be stored as backup. Furthermore, the application.properties file(s) also should be saved.

Import/export tables using S3 bucket

It is possible to use S3 buckets to import / export database tables. For this we use Spring AWS support. Our default settings are different than Spring's defaults so users who are not using this feature can run their services without AWS issues. The current constants are the following:

cloud
  aws
    region
      auto: false
      static: eu-central-1
    stack
      auto: false

cloud.aws.region.auto is responsible to decide if region should be discovered automatically, or not. If you are running inside AWS infrastructure you want to set this to true. cloud.aws.region.static must be set if can not be determined automatically (the previous setting is responsible for that). Spring's default is us-west-1. cloud.aws.stack.auto is responsible to find AWS stack services automatically by reading standard AWS resources. If you are running your instances on AWS you want to to enable this feature by setting to true.

Please be aware the following settings are not using Spring Relaxed Binding rules, they must be set exactly like this from properties/yaml file:

Use cloud.aws.credentials.useDefaultAwsCredentialsChain=true if you want to use the default AWS chain to discover your credentials. The default chain is searching the following sources in this exact order (and uses the first hit):

reads environment variables
reads JVM system properties
reads profile specific credentials (default profile is called default)
- For using IAM role based credentials (with WebIdentityTokenCredentialsProvider), you have to put the aws-java-sdk-sts-1.12.261.jar to the path manually (since it's not part of the AWS-Spring). To do this, copy the jar file into {jws_home}/extra-libs/ directory.
uses EC2 credential provider if available

If you are using any other method to determine the credentials, this should be switched to false, which is the default value.

Use cloud.aws.credentials.accessKey and cloud.aws.credentials.secretKey to manually set a static access key and secret key value.
Use cloud.aws.credentials.profilePath to specify your custom profile path URL and use cloud.aws.credentials.profileName to use a different profile from AWS credentials file.

In case of problems on getting the proper credentials: increase logging level to get more information

logging.level.com.amazonaws.request=DEBUG
logging.level.com.amazonaws.auth=DEBUG

You can setup S3 import/export the following settings:

# Specifies whether use FILE based import/export or S3 based import/export. Default is: `FILE`
com.chemaxon.webservices.db.import_export.dbExportStrategy=S3

# Specifies which S3 bucket to use. Default value: s3://export-bucket/ . The URL should follow the S3 URL scheme, like in the default value.
com.chemaxon.webservices.db.import_export.s3BuckeBasetUrl=s3://export-bucket/

AWS Fargate setup

When running DB Web Services on AWS Fargate, it is recommended to use a persistent data storage (e.g. Amazon RDS) and if upgrading in REINDEX mode, the export/import strategy should be set to S3:

# Specifies whether use FILE based import/export or S3 based import/export. Default is: FILE
com.chemaxon.webservices.db.import_export.dbExportStrategy=S3

# Specifies which S3 bucket to use. Default value: s3://export-bucket/ . The URL should follow the S3 URL scheme, like in the default value.
com.chemaxon.webservices.db.import_export.s3BuckeBasetUrl=s3://export-bucket/

Because in REINDEX mode the service reimports the tables at startup, the timeout and startPeriod of the AWS Fargate service should be set accordingly so it would give enough time for the reimport. (AWS API Reference - ECS HealthCheck)

The above import/export settings for S3 should also be set if using /rest-v1/db/additional/{tableName}/importFromFile/{fileName} and /rest-v1/db/additional/{tableName}/exportToFile endpoints on AWS Fargate.

Setting up backend suitable for AWS Fargate

DB Web Services use a backend where index data is stored through a JDBC connection. As default this JDBC connection is a file based H2 connection. On AWS Fargate systems persisting data on the file system of the AWS Fargate container is not advised as after deleting the container data gets lost. Therefore we suggest using a database outside the AWS Fargate container.

An example for setting a PostgreSQL Amazon RDS instance:

com.chemaxon.zetor.settings.gcrdb.sqlBuilderProvider=POSTGRESQL
com.chemaxon.zetor.settings.gcrdb.jdbcUrl=jdbc:postgresql://<URL_OF_DB>:<PORT>/<DB_NAME>
com.chemaxon.zetor.settings.gcrdb.user=<USER>
com.chemaxon.zetor.settings.gcrdb.password=<PASSWORD>

JSON Converter

Converts different molecular string representations into JSON format and molecular JSON representations to string.