Design Hub developer guide - import plugins¶
Import plugins can be used to push or pull data - compounds with meta data - into Design Hub from any external source (message queue, REST API, etc...)
NodeJS module API¶
An import plugin exports the following properties:
| Name | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Unique identifier of the plugin, used by Design Hub for identification and internal communication. If multiple plugins use the same identifier, the last one to be loaded overrides the others. |
label |
string | yes | Human readable name of the plugin, used by Design Hub to display GUI elements related to this plugin: as menu entry in the menu to enable the plugin, as title of the panel displaying the results. |
domains |
array of strings | yes | List of domains where this plugin may be used, when authentication is enabled in Design Hub. Use * to allow any domain. |
init |
function | yes | this is a domain specific context, described later in this doc. This method can be used to initialize connections and/or start scheduled jobs for the plugin. (!) In a multi-domain setup, init will be called for each domain separately with its own context. |
getSettings |
async function | no | Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based on return value of getSettings(). Has higher precendence than settings. |
settings |
array of objects | no | Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based the exported settings. This property is ignored if getSettings is defined. |
runImport |
async function | yes | Triggered in case of manual import, and gets the configuration in this |
cannotProcess |
async function | The import service calls this with the ids of records for which the processing was failed. | |
onConfigurationChanged |
function | no | A callback function that Design Hub calls during initialization and whenever an administrator updates the Secrets of the system. Arguments: config (Object) An object with secrets attribute containing the key-value pairs of secrets from the Admin interface |
Domain specific import context properties:
| Name | Type | Description |
|---|---|---|
domain |
string | Domain of this context |
logger |
object | Context specific logger with the typical debug, info, warn, error logging methods |
schedule |
object | Job scheduler |
storeRawData |
function(RawRecord[]) |
The plugin API provides this callback function to store the compounds immediately into the database. The number of stored records returned. |
RawRecord¶
The storeRawData callback function accepts a list of RawRecord objects. The purpose of this object is to hold all information on a compound that may be inserted, or updated.
The table below lists all the accepted attributes of such a record:
| Name | Type | Required | Description |
|---|---|---|---|
external_id |
string | Yes | A unique import record identifier. |
substance_id |
string | No | Physical substance identifier of the compound. |
virtual_id |
string | No | A non-Design Hub generated, unique virtual identifier of the compound. |
source |
string | Yes | Chemical structure of the compound in any file format recognized by Chemaxon IO system. |
owner_username |
string | Yes | username or identifier as obtained from the identity provider |
generate_virtual_id |
boolean | No | Deprecated |
project_id |
number | One of project_id and project_key required | Design Hub internal project identifier |
project_key |
string | One of project_id and project_key required | Design Hub external project identifier (acquired when fetching projects from company plugin) |
hypothesis_title |
string | No | title of a hypothesis in which compounds is stored |
designset_title |
string | No | title of a design set in which compound is stored |
status_id |
number | No | Design Hub internal status identifier. One of status_id or status_label required when visibility is set to 1. |
status_label |
string | No | Status label. One of status_id or status_label required when visibility is set to 1. |
source_system |
string | No | Label of the compound source. This attribute will be published for storage plugins. |
visibility |
number | No | Private/shared visibility flag. For private use 0, for public use 1. Default is 1 |
raw_data |
Object | No | Compound properties (predicted or experimental data). Object keys matching the name of compoundFields will be extracted and used to update the value of "Additional Fields", while the rest are stored as "Imported data". All previously imported data is replaced for this import source. |
add_tags |
string[] | No | Tags to be added to the compound records. |
allow_user_provisioning |
boolean | No | When set to false, then unknown owner_username records will fail. When set to true, then records with unknown owner_username values will succeed and create new deactivated user accounts. |
Compound Properties
The raw_data attribute of a RawRecord is simple object that accepts string, number, multi-row numbers and modified number values. See the example below:
For values with modifiers, the following value modifiers are accepted: <<, <, <=, =, *, ~, >=, >, >>.
Import sources
The application organizes calculated and imported data under unique keys with the following attributes:
- the type of source: realtime plugins (GUI > Spreadsheet views > Data drawer > Add); imported (NodeJS import plugins and the REST API endpoint
/api/import/rawdata); and user uploaded (GUI > New > Upload from file...) - the name of the source:
nameattribute of plugins; "rest-api"; and user file upload - the serialized form of settings used to obtain the data:
settingsvalue for realtime and import plugins; none for others - column label provided
Developers should keep in mind that for updating import requests, all previously provided raw_data for this source (i.e. type + name + settings) will be replaced.
Input processing¶
The following general processing steps are taken on each input record:
- based on
source, a static molecule image is generated - based on
source, an MRV formatted molecule representation is created - if
status_labelwas provided, resolve and confirm itsstatus_idexists - if
project_keywas provided, resolve and confirm itsproject_idexists - identify compound author based on
owner_username - if
hypothesis_titleanddesignset_titlewere provided, check or create the target hypothesis and design set.- for existing hypotheses and design sets, the owner must have write permission
- identify matching records in the content database based on
virtual_idorsubstance_id- the complete matching strategy fast tracks updates using previously used
external_idvalues - based on the provided
project_id, new records are inserted, but all matching records with thevirtual_idorsubstance_idhave their attributes updated to ensure data consistency
- the complete matching strategy fast tracks updates using previously used
- insert or update content
- enable chemical search on the compounds
If any of these steps fails, the record will be marked as FAILED and error cause is stored.
Examples¶
Load generated, private compound:
Creates new private compound with complete project, hypothesis, design set grouping. Repeated submission of the same RawRecord gets deduplicated based on the external_id, therefore no duplicate records will be inserted, however data updates are possible.
Load externally designed, shared compounds:
Creates a new shared compound in Project "P1" with a virtual ID given. On the assumption that DC000001 is globally unique, this value works as a valid external_id.
Synchronize experimental data on real compounds:
Adds CXN008 to project "P2". Since the source and assay data relate to asset CXN008, its best external_id is also CXN008. This way, if this compound is relevant to multiple projects (e.g. P1), all copies would be updated and this record makes sure a copy exists in P2 as well.
For further combinations, feel free to reach out to technical support with your use-case and available input data.
Configuration¶
All ingested RawRecord objects are stored in the database and are processed. For manually initiatied imports (GUI, REST API), processing is immediate. For automatic imports, imports are scheduled tasks. After successful processing, the content appears for users.
Theprocessing step can be controlled using the following options. For further details, see the Configuration Guide.
importProcessBatchSizeimportSchedulerPlan
Plugin skeleton¶
Below, you can find 2 skeleton files for a manual and an automatic import plugin implementing the API methods. The code below includes typescript definitions for all parameters and expected results, so that editors like Visual Studio Code can assist with static code analysis and adherence to the specifications.
skeleton-manual.import.js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
skeleton-automatic.import.js
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |