Architecture¶
Overview¶
The DSClient is designed to receive and store pre-formatted registration data, in a message, pushed from a Chemaxon Compound Registration (CompReg) system.
Data can be received either in a stateless HTTP format (recommended) or an ActiveMQ approach can be adopted.
Messages are automatically triggered by the following four actions:
- Registering a new compound
- Amending an existing compound — editing the compound data
- Updating the layout of a compound — changing the compound itself
- Deleting a compound
ActiveMQ Push Messaging¶
CompReg can directly push registration messages to an ActiveMQ queue, REGISTRATION.DS, which can be consumed by the DSClient.
The ActiveMQ communication protocol requires a dedicated broker which communicates via the OpenWire protocol, typically on port 61616, with optional authentication.
ActiveMQ authentication
Anonymous access of the ActiveMQ should only be used within a secure container-based network.
For all other scenarios, a JMS broker username/password should be configured.
HTTP polling¶
Direct HTTP polling of broadcasted CompReg messages are possible with the DSClient.
The simpler, more direct approach of HTTP polling results in a simple configuration with fewer components; better error handling mechanisms; audit trails in CompReg and the DSClient; and fine-grained control over throughput.
CompReg Downstream API Documentation¶
The CompReg system utilizes a REST API for accessing registration data.
The REST API can be used independently of the push mechanisms.
The DSClient only uses the /downstream/messages/ endpoint for retrieving data from CompReg.
However, the additional downstream endpoints can be used for manual testing/data integrity verification.
Database¶
The DSClient currently supports PostgreSQL, Oracle, MySQL and SQLite databases.
The downstream database schema is a snapshot of the CompReg registration tree structure, organized in a non-normalized format that mirrors the CompReg database architecture.
Key Relationships¶
The schema follows this relationship hierarchy:
flowchart LR
STRUCTURE --> PARENT --> VERSION --> PREPARATION
Terminology Difference
In the CompReg system, the lowest level of the registration hierarchy is called a lot. In the DSClient downstream database schema, this same entity is a preparation.
flowchart LR
STRUCTURE --> PARENT --> VERSION --> LOT
Schema Overview¶
The database schema consists of the following main table groups - Follow the link for a detailed description of the schema:
Core Structure Tables¶
structure- Stores all parent and salt/solvate modifier structures in JChem formatstructure_ul- Structure unique identifiersstructure_markush- Markush structure datastructure_markush_ul- Markush structure unique identifiersstructure_markush_md_mscr- Markush structure metadata
Structure Storage
The structure table stores all parent and salt/solvate modifier structures (separately) in JChem structure table format. The structure files are stored in the cd_structure CLOB column and can be searched using JChem libraries.
Registration Tree Tables¶
parent- Parent structure informationversion- Version structure informationpreparation- Preparation (lot) informationpreparation_project- Preparation-project relationships
Supporting Tables¶
saltsolvate- Salt and solvate informationversion_saltsolvate- Version-salt relationshipsadditional_data- Additional data fieldsjchemproperties- JChem properties for structuresjchemproperties_cr- JChem properties for CompReg structures
System Tables¶
http_message_job- HTTP polling job trackinghttp_failed_message- Failed message trackingflyway_schema_history- Database migration history
Schema Modifications
The database schema is statically defined. Any changes to database architecture, table or column names, data types, or workflows will require code changes and a new build of the custom DSClient.
Detailed Schema Documentation¶
For detailed information about each table, including column definitions, relationships, and data types, see the Database Schema documentation.
Configuration¶
Settings are stored in a simple flat key-value configuration file in the following files:
- CompReg:
registry.properties - DSClient:
registry.dsclient.properties
Configuration Variables¶
| Variable | Description |
|---|---|
RegDBType |
Database type (e.g., PostgreSQL) |
RegDBDriver |
JDBC driver class (e.g., org.postgresql.Driver) |
RegDBUrl |
Database connection URL |
RegDBUser |
Database username |
RegDBPass |
Database password |
RegDBMaxActive |
Maximum number of active database connections |
RegDBValidationQuery |
SQL query to validate connections (e.g., SELECT 1) |
RegDownstreamMode |
Downstream mode (e.g., Database) |
RegDownstreamPublishEnabled |
Enable/disable downstream publishing (true/false) |
RegDownstreamFusedImageFormat |
Format for fused structure images (e.g., mol:V3) |
CHEMAXON_LICENSE_URL |
URL to the Chemaxon license server |
| Variable | Description |
|---|---|
REGISTRYCXN_DSCLIENT_HOME |
Home directory for DSClient |
RegDSDBType |
Downstream database type (e.g., PostgreSQL) |
RegDSDBDriver |
JDBC driver class for downstream database |
RegDSDBUrl |
Downstream database connection URL |
RegDSDBUser |
Downstream database username |
RegDSDBPass |
Downstream database password |
RegDSDBMaxActive |
Maximum number of active downstream database connections |
RegDSDBValidationQuery |
SQL query to validate downstream connections |
RegDSClientCommunicationType |
Communication type (HTTP or JMS) |
RegDsClientHttpCompRegHost |
CompReg host URL (for HTTP communication) |
RegDsClientHttpCompRegClientId |
Client ID for HTTP authentication (must match created client) |
RegDsClientHttpCompRegClientSecret |
Client secret for HTTP authentication (must match created secret) |
RegDsClientHttpCompRegUser |
CompReg user for HTTP authentication |
RegDsClientHttpPaginationLimit |
Number of records per page for HTTP polling |
RegDsClientHttpPollingFrequencySeconds |
Polling frequency in seconds for HTTP communication |
RegDSClientStrictConsistency |
(default: false) enables strict consistency mode. When true, any processing error causes a rollback and stops processing entirely, and validates that messages follow in consecutive order with no missing IDs. When false (default), failed messages are skipped and saved to the failed message table. |