JChem Microservices FAQ and Known Issues

    FAQ

    Is there a maximum number of concurrent requests to API endpoints that it can handle

    Yes there is. What is this number? It depends.

    If a specific service (like calculations service), is used in standalone web application mode, then we expose it to Tomcat without any alteration. Tomcat by default allows 200 concurrent (worker) threads. (Can be changed by server.tomcat.threads.max property. Be aware it also has a maximum number for accepted connections which is 8192, and can be changed by server.tomcat.max-connections property.) But this is also influenced by the OS settings and the available memory. (A process can not sprout endless threads.)

    If the service is used in microservices system mode then our gateway service have hystrix circuit breaker installed which only allows 10 running threads and 90 waiting requests. These of course can be configured. For configuration detail please find our documentation or the hystrix's documentation. It is generally better to scale up the waiting requests rather than active ones since too many active tasks can cause throttling. If they have to scale, it is better to scale out the number of executing nodes and load balance the requests.

    What is the suggested number of items per request for endpoints that take an array as an input

    If the client runs a service in production for many users he is better off with many small requests than few huge ones. If the service is for a very few (even one) individual then few huge requests (with many structures) can have a performance benefit. It must be told: after a certain size the performance gain will be negligible. The number of ideal structures are also influenced by the requested method, and the "size" of the structures (and even the size of the structure representation, a.k.a. the chemical format). As a general rule of thumb I can say: requests should not take longer than 1 seconds. 1-2000 molecules can be a good number for that (but it depends). If a request takes more than 1 second it is more costly to experience any kind of error and it also limits the number of concurrent requests. In one second the total communication cost is less than 1% of the whole process, if you move beyond that it is meaningless.

    What is the storage backend behind JChem Microservices DB

    It is configurable. The default configuration using h2 db, but you can change it to PostgreSQL as well. See configuration details here.

    Access h2 DB backend behind JChem Microservices DB

    If your JChem Microservices DB configuration is using h2 backend, then it is possible to configure the service to access the embedded h2 db console. You need to add the following lines to application.properties file:

    spring.h2.console.enabled=true
    spring.h2.console.path=/h2
    spring.h2.console.settings.web-allow-others=true

    After restarting the service the database console is available at the localhost's 8062 port. http://localhost:8062/h2

    Logging parameters are configured in the application.properties file. The default values are:

    Name property name value
    JDBC URL: com.chemaxon.zetor.settings.gcrdb.jdbcUrl jdbc:h2:nio:./data/chemical-data/store/db
    User Name: com.chemaxon.zetor.settings.gcrdb.user user
    Password: com.chemaxon.zetor.settings.gcrdb.password password

    How to add authentication to JChem Microservices?

    It is supported through external beans with introduction of a new logic (filter, endpoint, filter, logging, health check, etc.). An example is available here.

    Spring configuration changes from version 21.19.0

    Starting with version 21.19.0 the spring cloud configuration settings are moved from bootstrap.properties to application.properties (because of upgrading Spring to 2.4.11).

    Properties deleted from bootstrap.properties:

    spring.cloud.config.failFast=true
    spring.cloud.config.uri=${CONFIG_SERVER_URI:http\://localhost\:8888/}
    spring.cloud.config.retry.initialInterval=3000
    spring.cloud.config.retry.multiplier=1.2
    spring.cloud.config.retry.maxInterval=60000
    spring.cloud.config.retry.maxAttempts=100
    spring.cloud.config.retry.maxAttempts=100

    Property added to application.properties:

    spring.config.import=configserver:${CONFIG_SERVER_URI:http\://localhost\:8888}?fail-fast=true&max-attempts=100&max-interval=60000&multiplier=1.2&initial-interval=3000

    You can read more about these settings here: https://docs.spring.io/spring-cloud-config/docs/3.0.5/reference/html/#_config_client_retry_with_spring_config_import

    Note that when run as a standalone application, this configuration should be turned off or set to optional:

    spring.config.import=optional:configserver:${CONFIG_SERVER_URI:http\://localhost\:8888}?fail-fast=true&max-attempts=100&max-interval=60000&multiplier=1.2&initial-interval=3000

    Which endpoint to use for inserting structures?

    There are four endpoints provided for inserting structures.

    /rest-v1/db/additional/upload Structures can be inserted from any non-binary format of chemical files, like sdf, mrv, smiles. Table name must be specified. Input format can be specified, if not specified it will be autorecognised. ID values are taken from the file, if specified in the file to be uploaded, otherwise are autogenerated. The failed and the successful IDs are given back in the response.

    /rest-v1/db/additional/{tableName}/batchInsert Structures can be inserted in json format, together with their IDs (optional) and with their additional data (optional). Input format can be specified, if not specified it will be autorecognised. The failed and the successful IDs are given back in the response.

    /rest-v1/db/additional/{tableName}/{id} One structure - with the given ID - can be inserted or overwritten if it already exists. Input format can be specified, if not specified it will be autorecognised.

    /rest-v1/db/additional/{tableName}/importFromFile/{fileName} Imports all data from the specified file to the specified table. The file can be a .json, or a .zip archive where first the item is the .json. This endpoint can be used for re-importing the content of a previously exported (by DB web services) table into a new table.

    What is called table in JChem Microservices DB?

    In JChem Microservices DB we use relational databases (H2, PostgreSQL) as backend storage. The molecules and their additional data are stored there, but not in the traditional relational database mode, more like in a key-value store where the values are the molecules and their additional data. These data can be searched exclusively through the REST API of JChem Microservices DB, not through the SQL API of the database.

    What is scheme in the parameter com.chemaxon.zetor.settings.scheme?

    The word scheme in this parameter does not refer to a database scheme, it is only an internal name we use for the type of the storage backend.

    HTTPS or HTTP? How to configure SSL?

    SSL can be configured as described in the Spring documentation.

    How much Xmx should be given to the services?

    The Spring Boot application itself needs min 32 MB.

    The default Xmx parameters can be seen in the .vmoptions files, these values were given for normal usage, not for extra large data sets, not for very complex chemical structures.

    • 1 GB for the Calculations
    • 4 GB for the DB
    • 1 GB for the Reactor
    • 1 GB for the StructureChecker
    • 256 MB for the IO
    • 256 MB for the Structure Manipulation

    Furthermore, JChem Engines cache and memory calculator page helps to pre-calculate the necessary cache and memory needs based on the quantity of the molecules and on further parameters, options for the DB Web Services.

    Further rule of thumb for the hardware selection could be:

    • number of cores increase the number of parallel users
    • speed of CPU increases the throughput
    • memory can increase the throughput, but after a certain number you also have to scale other settings with it

    How to define a specific JRE to be used instead of the one defined in JAVA_HOME or in JDK_HOME?

    As seen in the Readme file in the directory <jws_home>/jre/, you have to put the JRE into this directory.

    Known Issues

    MarvinJS cannot be used with JChem Microservices backend

    Since JChem Microservices separated the services based on functionalities, different modules provide the endpoints needed for MarvinJS frontend. These modifications are not yet followed at frontend part, so until then we suggest to use Marvin JS Web Services backend.

    False similarity search results in the case of molecule types with tautomerHandlingMode=GENERIC parameter

    In the case molecule types with tautomerHandlingMode=GENERIC parameter, similarity search gives false results. These is no workaround at the moment, please do not execute similarity search in table having molecule type with com.chemaxon.zetor.types[n].tautomerHandlingMode=GENERIC parameter specified in the application.properties file.

    From version 21.9.0 similarity search works correctly even in the case of molecule types with tautomerHandlingMode=GENERIC parameter.

    Additional parameters do not work in gcrdb scheme in version 21.2.0 and 21.3.0

    If the default schema gcrdb is set in the application.properties file, the additional parameters - taken from JChem Engines cache and memory calculator like com.chemaxon.zetor.settings.molecule.cachedObjectCount are not taken into account. If you want to set these additional parameters, please set com.chemaxon.zetor.settings.scheme= mapdb

    Upgrade requirement relating versions 22.3.0, 21.4.7 and 21.15.3

    H2 update was required in JChem Microservices because of the CVE-2021-42392 vulnerability issue. When upgrading to JChem Microservices versions 22.3.0 or Helium.7 (21.4.7) or Iodine.3 (21.15.3) the next instructions must be followed.

    Files of the new H2 version are incompatible with the old version. You need to use the import and export json options to upgrade. The default connection string in the application.properties changed: COMPRESS=TRUE was turned on already, MAX_COMPACT_TIME=10000;DEFRAG_ALWAYS=TRUE settings were added (H2 tries to compact the database when the service is stopped for a maximum of 10 seconds).