The general registration process is as follows. The data needed for chemical registration process (structure, external IDs, salt/solvate info, etc.) are submitted to the Compound Registration. Everything goes through a comprehensive checking process shown in figure Overview 1, which i ncludes validation, standardization, structure checking (and appropriate auto fixing) and structure matching steps.
The first step is a customizable standardization, which is followed by structure checker steps.
Some of the checkers serve only as quality checks. The quality checkers are defined for the whole database and contain error checkers optionally combined with auto-fixers. It is possible to define on source level by a system switcher if the quality checks should run or not.
Other checkers are selectable structure fixers (e.g. to change bond types or enhanced stereo information according to the business logic). This set can be separately defined for every source.
In case of any checker alert that cannot be fixed or is not set to be fixed automatically, the submission falls into the Staging area.
|Figure Overview 1. Process of the compound registration|
If no issue was found, then a Registration ID is generated. The ID generation process is based on matches of the incoming structure with already registered structures. After the proper treatment of the structure matches, the new compound is registered into the 3-level hierarchy of the registry.
Parent - The neutralized (if possible), non-isotopic structure, without any salt/solvate info. Compound Registration ID will be generated on this level.
Version - The isotopic/charged version, including the salt/solvate info as well. A different isotopic form and/or different salt/solvate info is considered as a new version (the specification of allowed multiplicities is needed). The role of a version is to group the preparations that share the very same chemical structure of a certain parent – including salts, charges and isotopes.
Preparation - The preparation related info, like notebook reference number or lot reference ID.
During the registration, if the parent is unique, then unique corporate registration IDs (PCN, CN and LN) are generated. If only the version was unique, then only CN and LN are generated, and if even the version structure is a duplicate, then only a new preparation with generated LN will be inserted under the same version.
If any issues found during the autoregistration process, the compound falls into the staging area with the appropriate error status. Users with corresponding privileges (e.g. registrars) can pick up failed submissions, fix them either one by one or fix them in bulk correction mode if multiple submissions can be fixed by changing the system switcher and structure checker/fixer settings. After the molecular structure of a submission is drawn or corrected, the changes can be saved and – based on user/role settings - another person (e.g. a registrar) can review and register it.
During the manual registration process (Advanced Registration mode or from the Staging area) the validation, standardization and structure checker/ fixer steps are applied on the structure to be registered, then the matches are displayed (exact, stereochemical and tautomer matches), and the user can accept a matching registered structure or can register its structure as it is drawn. A kind of mock registration is also available when using the "Find" option, which takes the compound through all registration related conversions, and displays any match to compounds already in the registry.
The structures are registered into a "tautomer" JChem table, which makes finding a tautomer match very fast. If a compound is found to be the tautomer of another one already registered, the system can force the new structure to fall into the staging area – depending on the configuration settings – where the chemist or other privileged user can pick it up and register it as a new lot of the existing structure or register it as it is drawn. A registered structure can always by changed by a privileged user. Stereochemical matches can be registered in a similar way, by either accepting an available structure or register it as it is drawn. It is defined in the configuration settings of each source (where source means the origin of the structure that has to be registered, e.g. the ELNB or the Bulk registration), if it is enabled to automatically register tautomer or stereochemical matches, or force them to the staging area.
Preparations can be directly registered under a specified version. In this case, the entered molecule structure is neglected (if available), and only the preparation (lot) related info is registered as a new preparation under the version.
If a difference between two compounds cannot be described in their molecular structure, the Chemically Significant Text (CST) field can be used to provide structural data, which forces the structure to be unique. It is very useful in case of registering compounds with no structural information (yet), or if the privileged user wants to register the same structure under a different registration ID.
ChemAxon Compound Registration does support registration of compounds without structure and without CST. These structure types are referred as "No structure". In this case new parent IDs will be allocated for each registered lot that has no structure and no CST, but there is always an option to register a preparation directly under a version specified by its CN.
ChemAxon Compound Registration does support four types of multi-component compounds. These are referred as alternates, mixtures, formulations and polymers. Alternate means that the structure is one of the several drawn structures, but it is not known which one. Mixture composition is represented by %-ranges, while formulations have exact composition information. Polymers are generated structures based on configurable reaction "rules" that are set between the monomers.
Compound Registration also supports Markush structures. "Small" Markush libraries can be registered successfully, as all the other additional chemical databases using JChem can certainly support Markush structures. The list of supported Markush features include R-groups, link nodes, atom lists, position variation, repeating units with repetition ranges, homology groups.
Successfully registered compounds are stored in the registry, and a downstream service can transfer them to other corporate database(s), incorporating additional data as required. As it is already described above, it is possible to have a corporate database, where the compound registry related data is stored in dedicated tables (read-only for a normal user), while additional data is stored in other data tables.
Privileged users can amend (modify) any compound or its related data in the registry using the Details page of the Compound Registration web client. All amendment steps are audited, and the complete history of changes is available for each entity, from any level of the tree.
External IDs or alternative names can be stored in additional data tables that can be specified on each (parent, version and lot) level.
User management can use the company's central LDAP/AD system (if exists), or its own integrated user management. In both cases the roles can be customized in order to access only specific parts of the registry and to do only limited actions (e.g. a specific user group cannot register from the staging area, or cannot create a new version of the same tree).