JChem Microservices FAQ and Known Issues

FAQ

Is there a maximum number of concurrent requests to API endpoints that it can handle

Yes there is. What is this number? It depends.

If a specific service (like calculations service), is used in standalone web application mode, then we expose it to Tomcat without any alteration. Tomcat by default allows 200 concurrent (worker) threads. (Can be changed by server.tomcat.threads.max property. Be aware it also has a maximum number for accepted connections which is 8192, and can be changed by server.tomcat.max-connections property.) But this is also influenced by the OS settings and the available memory. (A process can not sprout endless threads.)

If the service is used in microservices system mode then our gateway service have hystrix circuit breaker installed which only allows 10 running threads and 90 waiting requests. These of course can be configured. For configuration detail please find our documentation or the hystrix's documentation. It is generally better to scale up the waiting requests rather than active ones since too many active tasks can cause throttling. If they have to scale, it is better to scale out the number of executing nodes and load balance the requests.

What is the suggested number of items per request for endpoints that take an array as an input

If the client runs a service in production for many users he is better off with many small requests than few huge ones. If the service is for a very few (even one) individual then few huge requests (with many structures) can have a performance benefit. It must be told: after a certain size the performance gain will be negligible. The number of ideal structures are also influenced by the requested method, and the "size" of the structures (and even the size of the structure representation, a.k.a. the chemical format). As a general rule of thumb I can say: requests should not take longer than 1 seconds. 1-2000 molecules can be a good number for that (but it depends). If a request takes more than 1 second it is more costly to experience any kind of error and it also limits the number of concurrent requests. In one second the total communication cost is less than 1% of the whole process, if you move beyond that it is meaningless.

What is the storage backend behind JChem Microservices DB

It is configurable. The default configuration using h2 db, but you can change it to PostgreSQL as well. See configuration details here.

Access h2 DB backend behind JChem Microservices DB

If your JChem Microservices DB configuration is using h2 backend, then it is possible to configure the service to access the embedded h2 db console. You need to add the following lines to application.properties file:

spring.h2.console.enabled=true
spring.h2.console.path=/h2
spring.h2.console.settings.web-allow-others=true

After restarting the service the database console is available at the localhost's 8062 port. http://localhost:8062/h2

Logging parameters are configured in the application.properties file. The default values are:

Name	property name	value
JDBC URL:	com.chemaxon.zetor.settings.gcrdb.jdbcUrl	jdbc:h2:nio:./data/chemical-data/store/db
User Name:	com.chemaxon.zetor.settings.gcrdb.user	user
Password:	com.chemaxon.zetor.settings.gcrdb.password	password

How to add authentication to JChem Microservices?

It is supported through external beans with introduction of a new logic (filter, endpoint, filter, logging, health check, etc.). An example is available here.

Spring configuration changes from version 22.6.0

Starting with version 22.6.0 the spring cloud configuration settings are moved from bootstrap.properties to application.properties (because of Spring upgrade).

Properties deleted from bootstrap.properties:

spring.cloud.config.failFast=true
spring.cloud.config.uri=${CONFIG_SERVER_URI:http\://localhost\:8888/}
spring.cloud.config.retry.initialInterval=3000
spring.cloud.config.retry.multiplier=1.2
spring.cloud.config.retry.maxInterval=60000
spring.cloud.config.retry.maxAttempts=100
spring.cloud.config.retry.maxAttempts=100

Property added to application.properties:

spring.config.import=configserver:${CONFIG_SERVER_URI:http\://localhost\:8888}?fail-fast=true&max-attempts=100&max-interval=60000&multiplier=1.2&initial-interval=3000

Note that when run as a standalone application, the spring.config.import configuration should be removed or commented out and spring.cloud.config.enabled should be set to false:

#spring.config.import=optional:configserver:${CONFIG_SERVER_URI:http\://localhost\:8888}?fail-fast=true&max-attempts=100&max-interval=60000&multiplier=1.2&initial-interval=3000
spring.cloud.config.enabled=false

Which endpoint to use for inserting structures?

There are four endpoints provided for inserting structures.

/rest-v1/db/additional/upload Structures can be inserted from any non-binary format of chemical files, like sdf, mrv, smiles. Table name must be specified. Input format can be specified, if not specified it will be autorecognised. ID values are taken from the file, if specified in the file to be uploaded, otherwise are autogenerated. The failed and the successful IDs are given back in the response.

/rest-v1/db/additional/{tableName}/batchInsert Structures can be inserted in json format, together with their IDs (optional) and with their additional data (optional). Input format can be specified, if not specified it will be autorecognised. The failed and the successful IDs are given back in the response.

/rest-v1/db/additional/{tableName}/{id} One structure - with the given ID - can be inserted or overwritten if it already exists. Input format can be specified, if not specified it will be autorecognised.

/rest-v1/db/additional/{tableName}/importFromFile/{fileName} Imports all data from the specified file to the specified table. The file can be a .json, or a .zip archive where first the item is the .json. This endpoint can be used for re-importing the content of a previously exported (by DB web services) table into a new table.

What is called table in JChem Microservices DB?

In JChem Microservices DB we use relational databases (H2, PostgreSQL) as backend storage. The molecules and their additional data are stored there, but not in the traditional relational database mode, more like in a key-value store where the values are the molecules and their additional data. These data can be searched exclusively through the REST API of JChem Microservices DB, not through the SQL API of the database.

What is scheme in the parameter com.chemaxon.zetor.settings.scheme?

The word scheme in this parameter does not refer to a database scheme, it is only an internal name we use for the type of the storage backend.

HTTPS or HTTP? How to configure SSL?

SSL can be configured as described in the Spring documentation.

How much Xmx should be given to the services?

The Spring Boot application itself needs min 32 MB.

The default Xmx parameters can be seen in the .vmoptions files, these values were given for normal usage, not for extra large data sets, not for very complex chemical structures.

1 GB for the Calculations
4 GB for the DB
1 GB for the Reactor
1 GB for the StructureChecker
256 MB for the IO
256 MB for the Structure Manipulation

Furthermore, JChem Engines cache and memory calculator page helps to pre-calculate the necessary cache and memory needs based on the quantity of the molecules and on further parameters, options for the DB Web Services.

Further rule of thumb for the hardware selection could be:

number of cores increase the number of parallel users
speed of CPU increases the throughput
memory can increase the throughput, but after a certain number you also have to scale other settings with it

How to define a specific JRE to be used instead of the one defined in JAVA_HOME or in JDK_HOME?

As seen in the Readme file in the directory <jws_home>/jre/, you have to put the JRE into this directory.

How to import data from csv files containing structures in multiline format?

See the required steps to be executed when importing a csv file.

Can I use JChem Microservices in cloud environment

Yes, you can. In Amazon environment, JChem Microservices can be installed in the following ways:

install JChem Microservices on EC2, and use an Amazon PostgreSQL RDS as backend store for the DB module
install JChem Microservices with Amazon Fargate and use Amazon PostgreSQL RDS as backend store for the DB module See documentation.
install JChem Microservices with Amazon Fargate and use an EC2 instance with PosgreSQL database as backend store for the DB module
install JChem Microservices on EC2, configure the DB module to use H2 backend
install JChem Microservices on EC2 and the PostgreSQL backend database on the same or on separate EC2 instance

JChem Microservices is tested in Amazon environment. Other cloud provider's environment was not tested, but we see no reason why they shouldn't work.

Upgrade requirement relating versions 22.6.0, 21.4.7 and 21.15.3

H2 update was required in JChem Microservices because of the CVE-2021-42392 vulnerability issue. When upgrading to JChem Microservices versions 22.6.0 or Helium.7 (21.4.7) or Iodine.3 (21.15.3) the next instructions must be followed.

Files of the new H2 version are incompatible with the old version. You need to use the export and import json options to upgrade. The export and import json options must also be used if not H2 but PostgreSQL database is the applied backend store.
The default connection string in the application.properties changed: COMPRESS=TRUE was turned on already, MAX_COMPACT_TIME=10000;DEFRAG_ALWAYS=TRUE settings were added (H2 tries to compact the database when the service is stopped for a maximum of 10 seconds).

Known Issues

MarvinJS cannot be used with JChem Microservices backend

Since JChem Microservices separated the services based on functionalities, different modules provide the endpoints needed for MarvinJS frontend. These modifications are not yet followed at frontend part, so until then we suggest to use Marvin JS Web Services backend.

False similarity search results in the case of molecule types with tautomerHandlingMode=GENERIC parameter

In the case molecule types with tautomerHandlingMode=GENERIC parameter, similarity search gives false results. These is no workaround at the moment, please do not execute similarity search in table having molecule type with com.chemaxon.zetor.types[n].tautomerHandlingMode=GENERIC parameter specified in the application.properties file.

From version 21.9.0 similarity search works correctly even in the case of molecule types with tautomerHandlingMode=GENERIC parameter.

Additional parameters do not work in gcrdb scheme in version 21.2.0 and 21.3.0

If the default schema gcrdb is set in the application.properties file, the additional parameters - taken from JChem Engines cache and memory calculator like com.chemaxon.zetor.settings.molecule.cachedObjectCount are not taken into account. If you want to set these additional parameters, please set com.chemaxon.zetor.settings.scheme= mapdb

What to do if Hazelcast instances do not find each other

If the storage space backend is set to CRDB in the service config file and multiple services were installed. and the Hazelcast instances were set to use the same cluster name, then the cluster names were overridden with different names for different instances. Therefore the instances may not find each other. As a workaround, setting the system property to cxnSingleClusterOverride=false solves the issue.