Design Hub developer guide - import plugins

Import plugins can be used to push or pull data - compounds with meta data - into Design Hub from any external source (message queue, REST API, etc...)

NodeJS module API

An import plugin exports the following properties:

Name	Type	Required	Description
`name`	string	yes	Unique identifier of the plugin, used by Design Hub for identification and internal communication. If multiple plugins use the same identifier, the last one to be loaded overrides the others.
`label`	string	yes	Human readable name of the plugin, used by Design Hub to display GUI elements related to this plugin: as menu entry in the menu to enable the plugin, as title of the panel displaying the results.
`domains`	array of strings	yes	List of domains where this plugin may be used, when authentication is enabled in Design Hub. Use `*` to allow any domain.
`init`	function	yes	`this` is a domain specific context, described later in this doc. This method can be used to initialize connections and/or start scheduled jobs for the plugin. (!) In a multi-domain setup, `init` will be called for each domain separately with its own context.
`getSettings`	async function	no	Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based on return value of `getSettings()`. Has higher precendence than `settings`.
`settings`	array of objects	no	Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based the exported settings. This property is ignored if `getSettings` is defined.
`runImport`	async function	yes	Triggered in case of manual import, and gets the configuration in `this`
`cannotProcess`	async function		The import service calls this with the ids of records for which the processing was failed.
`onConfigurationChanged`	function	no	A callback function that Design Hub calls during initialization and whenever an administrator updates the Secrets of the system. Arguments: `config (Object)` An object with `secrets` attribute containing the key-value pairs of secrets from the Admin interface

Domain specific import context properties:

Name	Type	Description
`domain`	string	Domain of this context
`logger`	object	Context specific logger with the typical `debug`, `info`, `warn`, `error` logging methods
`schedule`	object	Job scheduler
`storeRawData`	`function(RawRecord[])`	The plugin API provides this callback function to store the compounds immediately into the database. The number of stored records returned.

RawRecord

The storeRawData callback function accepts a list of RawRecord objects. The purpose of this object is to hold all information on a compound that may be inserted, or updated.

The table below lists all the accepted attributes of such a record:

Name	Type	Required	Description
`external_id`	string	Yes	A unique import record identifier.
`substance_id`	string	No	Physical substance identifier of the compound.
`virtual_id`	string	No	A non-Design Hub generated, unique virtual identifier of the compound.
`source`	string	Yes	Chemical structure of the compound in any file format recognized by Chemaxon IO system.
`owner_username`	string	Yes	username or identifier as obtained from the identity provider
`generate_virtual_id`	boolean	No	Deprecated
`project_id`	number	One of project_id and project_key required	Design Hub internal project identifier
`project_key`	string	One of project_id and project_key required	Design Hub external project identifier (acquired when fetching projects from `company` plugin)
`hypothesis_title`	string	No	title of a hypothesis in which compounds is stored
`designset_title`	string	No	title of a design set in which compound is stored
`status_id`	number	No	Design Hub internal status identifier. One of `status_id` or `status_label` required when `visibility` is set to `1`.
`status_label`	string	No	Status label. One of `status_id` or `status_label` required when `visibility` is set to `1`.
`source_system`	string	No	Label of the compound source. This attribute will be published for `storage` plugins.
`visibility`	number	No	Private/shared visibility flag. For private use 0, for public use 1. Default is 1
`raw_data`	Object	No	Compound properties (predicted or experimental data). Object keys matching the name of `compoundFields` will be extracted and used to update the value of "Additional Fields", while the rest are stored as "Imported data". All previously imported data is replaced for this import source.
`add_tags`	string[]	No	Tags to be added to the compound records.
`allow_user_provisioning`	boolean	No	When set to false, then unknown `owner_username` records will fail. When set to true, then records with unknown `owner_username` values will succeed and create new deactivated user accounts.

Compound Properties

The raw_data attribute of a RawRecord is simple object that accepts string, number, multi-row numbers and modified number values. See the example below:

{
   "external_id": "CHEMBL25",
   "substance_id": "CHEMBL25",
   ...
   "raw_data": {
      "My Compound Field": 2,
      "Toxicity Assessment": "Safe",
      "Purity %": 99,
      "COX-1 IC50 uM": {
         "value": 4.45,
         "modifier": ">"
      },
      "Inhibition of EGFR at 10 uM %": [96.0, 92.0, 97.7, 100.0]
   }
}

For values with modifiers, the following value modifiers are accepted: <<, <, <=, =, *, ~, >=, >, >>.

Import sources

The application organizes calculated and imported data under unique keys with the following attributes:

the type of source: realtime plugins (GUI > Spreadsheet views > Data drawer > Add); imported (NodeJS import plugins and the REST API endpoint /api/import/rawdata); and user uploaded (GUI > New > Upload from file...)
the name of the source: name attribute of plugins; "rest-api"; and user file upload
the serialized form of settings used to obtain the data: settings value for realtime and import plugins; none for others
column label provided

Developers should keep in mind that for updating import requests, all previously provided raw_data for this source (i.e. type + name + settings) will be replaced.

Input processing

The following general processing steps are taken on each input record:

based on source, a static molecule image is generated
based on source, an MRV formatted molecule representation is created
if status_label was provided, resolve and confirm its status_id exists
if project_key was provided, resolve and confirm its project_id exists
identify compound author based on owner_username
if hypothesis_title and designset_title were provided, check or create the target hypothesis and design set.
- for existing hypotheses and design sets, the owner must have write permission
identify matching records in the content database based on virtual_id or substance_id
- the complete matching strategy fast tracks updates using previously used external_id values
- based on the provided project_id, new records are inserted, but all matching records with the virtual_id or substance_id have their attributes updated to ensure data consistency
insert or update content
enable chemical search on the compounds

If any of these steps fails, the record will be marked as FAILED and error cause is stored.

Examples

Load generated, private compound:

[{
  "external_id": "gen-ai-modelversion-executiondate-output1",
  "source": "c1ccccc1",
  "owner_username": "id@company.com",
  "project_key: "P1",
  "hypothesis_title": "Binding hypothesis X",
  "designset_title": "Route 1",
  "visibility": 0,
  "raw_data": {
    "Confidence": 0.93
  }
}]

Creates new private compound with complete project, hypothesis, design set grouping. Repeated submission of the same RawRecord gets deduplicated based on the external_id, therefore no duplicate records will be inserted, however data updates are possible.

Load externally designed, shared compounds:

[{
  "external_id": "DC000001",
  "project_key: "P1",
  "virtual_id": "DC000001",
  "source": "c1ccccc1",
  "status_label": "Draft",
  "owner_username": "id@company.com",
  "raw_data": {
    "Predicted pIC50": 5.2
  }
}]

Creates a new shared compound in Project "P1" with a virtual ID given. On the assumption that DC000001 is globally unique, this value works as a valid external_id.

Synchronize experimental data on real compounds:

[{
  "external_id": "CXN008",
  "project_key: "P2",
  "substance_id": "CXN008",
  "source": "c1ccccc1",
  "status_label": "Synthesis Completed",
  "owner_username": "id@company.com",
  "raw_data": {
    "Assay Vendor Target IC50 (nM)": {
      "value": 5432,
      "modifier": "="
    },
    "Inhibition value (%)": [4.18, 6.022, 8.314]
  }
}]

Adds CXN008 to project "P2". Since the source and assay data relate to asset CXN008, its best external_id is also CXN008. This way, if this compound is relevant to multiple projects (e.g. P1), all copies would be updated and this record makes sure a copy exists in P2 as well.

For further combinations, feel free to reach out to technical support with your use-case and available input data.

Performance tuning

All ingested RawRecord objects are stored in the database and are processed. For manually initiatied imports (GUI, REST API), processing is immediate. For automatic imports, imports are scheduled tasks. After successful processing, the content appears for users.

Theprocessing step can be controlled using the following options. For further details, see the Configuration Guide.

importProcessBatchSize
importSchedulerPlan

Plugin skeleton

Below, you can find 2 skeleton files for a manual and an automatic import plugin implementing the API methods. The code below includes typescript definitions for all parameters and expected results, so that editors like Visual Studio Code can assist with static code analysis and adherence to the specifications.

skeleton-manual.import.js

//@ts-check
"use strict";

const dhutils = require("@chemaxon/dh-utils");

/**
 *
 * @typedef {Object} RawRecord
 * @prop {string} external_id
 * @prop {string} [substance_id]
 * @prop {string} [virtual_id]
 * @prop {string} source - chemical structure
 * @prop {string} owner_username
 * @prop {number} [project_id] Internal DH project identifer
 * @prop {string} [project_key]
 * @prop {number} [status_id] Internal DH status identifier
 * @prop {string} [status_label]
 * @prop {string[]} [add_tags]
 * @prop {string} [hypothesis_title]
 * @prop {string} [designset_title]
 * @prop {number} [visibility]
 * @prop {string} [source_system]
 * @prop {boolean} [allow_user_provisioning]
 * @prop {{[key: string]: string|number|number[]|NumberWithModifier}} raw_data - compound properties (assay data)
 * @deprecated @prop {boolean} [generate_virtual_id]
 *
 * @typedef {Object} NumberWithModifier
 * @prop {number} value
 * @prop {string} modifier
 *
 * @typedef PluginSettings
 * @prop {string} label
 * @prop {'boolean'|'number'|'enum'|'multienum'|'project'|'text'|'objectenum'|'objectmultienum'} type
 * @prop {string[]|number[]|{id: string, label: string, category?: string}[]} [values]
 * @prop {string|number|boolean} [default]
 * @prop {number} [min]
 * @prop {number} [max]
 *
 * @typedef ImportInitContext
 * @prop {string} domain
 * @prop {Logger} logger
 * @prop {import("node-schedule").schedule} schedule
 *
 * @typedef Logger
 * @prop {function(...any): void} info
 * @prop {function(...any): void} warn
 * @prop {function(...any): void} error
 *
 * @typedef {function(RawRecord[]): Promise<number>} StoreCallback
 *
 * @typedef GetSettingsContext
 * @prop {User} user
 *
 * @typedef RunImportContext
 * @prop {User} user
 * @prop {PluginConfiguration} settings
 * @prop {string} domain
 * @prop {StoreCallback} storeRawData
 *
 * @typedef User
 * @prop {string} userName
 * @prop {any} tokens OIDC TokenSet
 *
 * @typedef {any} PluginConfiguration Project is DH internal project ID
 *
 * @typedef ConfigurationValues
 * @prop {{[key: string]: string}} secrets
 */

/**
 * @this {ImportInitContext}
 */
function init() {
  //store the logger instance
}

/**
 * @this {GetSettingsContext}
 * @returns {Promise<PluginSettings[]>}
 */
async function getSettings() {
  console.log("plugin-name getSettings", this.user);
  return [];
}

/**
 * @this {RunImportContext}
 * @returns {Promise<{ successCount: number }>}
 */
async function runImport() {
  console.log("user is requesting data with settings", this.user, this.settings);

  //obtain data
  //transform data to records

  //submit records to DH API
  const successCount = await this.storeRawData(records);

  return { successCount };

}

/**
 * @this {CannotProcessContext}
 * @param {string[]} externalIds
 */
async function cannotProcess(externalIds) {
  console.log("Cannot import IDs", externalIds);
}

/**
 * Store and use values provided by Admin interface's Secret manager
 * @param {ConfigurationValues} config
 */
function onConfigurationChanged(config) {
  console.log("plugin-name configuration", config.secrets);
}

module.exports = {
  name: "manual-plugin-name",
  label: "Plugin Label",
  init: init,
  runImport: runImport,
  getSettings: getSettings,
  cannotProcess: cannotProcess,
  domains: ["*"],
  onConfigurationChanged: onConfigurationChanged
};

skeleton-automatic.import.js

//@ts-check
"use strict";

const dhutils = require("@chemaxon/dh-utils");

/**
 *
 * @typedef {Object} RawRecord
 * @prop {string} external_id
 * @prop {string} [substance_id]
 * @prop {string} [virtual_id]
 * @prop {string} source - chemical structure
 * @prop {string} owner_username
 * @prop {number} [project_id] Internal DH project identifer
 * @prop {string} [project_key]
 * @prop {number} [status_id] Internal DH status identifier
 * @prop {string} [status_label]
 * @prop {string[]} [add_tags]
 * @prop {string} [hypothesis_title]
 * @prop {string} [designset_title]
 * @prop {number} [visibility]
 * @prop {string} [source_system]
 * @prop {{[key: string]: string|number|number[]|NumberWithModifier}} raw_data - compound properties (assay data)
 * @deprecated @prop {boolean} [generate_virtual_id]
 *
 * @typedef {Object} NumberWithModifier
 * @prop {number} value
 * @prop {string} modifier
 *
 * @typedef PluginSettings
 * @prop {string} label
 * @prop {'boolean'|'number'|'enum'|'multienum'|'project'|'text'|'objectenum'|'objectmultienum'} type
 * @prop {string[]|number[]|{id: string, label: string, category?: string}[]} [values]
 * @prop {string|number|boolean} [default]
 * @prop {number} [min]
 * @prop {number} [max]
 *
 * @typedef ImportInitContext
 * @prop {string} domain
 * @prop {Logger} logger
 * @prop {import("node-schedule").schedule} schedule
 * @prop {StoreCallback} storeRawData
 *
 * @typedef Logger
 * @prop {function(...any): void} info
 * @prop {function(...any): void} warn
 * @prop {function(...any): void} error
 *
 * @typedef {function(RawRecord[]): Promise<number>} StoreCallback
 *
 * @typedef GetSettingsContext
 * @prop {User} user
 *
 * @typedef RunImportContext
 * @prop {User} user
 * @prop {PluginConfiguration} settings
 * @prop {string} domain
 * @prop {StoreCallback} storeRawData
 *
 * @typedef User
 * @prop {string} userName
 * @prop {any} tokens OIDC TokenSet
 *
 * @typedef {any} PluginConfiguration Project is DH internal project ID
 *
 * @typedef ConfigurationValues
 * @prop {{[key: string]: string}} secrets
 */

/**
 * @this {ImportInitContext}
 */
function init() {
  //store the logger instance

  //set up the cron job
  const job = this.schedule.scheduleJob("0 0,30 * * * *", runImport.bind(this));
}

/**
 * @this {RunImportContext}
 * @returns {Promise<{ successCount: number }>}
 */
async function runImport() {
  //obtain data
  //transform data to records

  //submit records to DH API
  const successCount = await this.storeRawData(records);

  return { successCount };

}

/**
 * @this {CannotProcessContext}
 * @param {string[]} externalIds
 */
async function cannotProcess(externalIds) {
  console.log("Cannot import IDs", externalIds);
}

/**
 * Store and use values provided by Admin interface's Secret manager
 * @param {ConfigurationValues} config
 */
function onConfigurationChanged(config) {
  console.log("plugin-name configuration", config.secrets);
}

module.exports = {
  name: "automatic-plugin-name",
  label: "Plugin Label",
  init: init,
  runImport: runImport,
  cannotProcess: cannotProcess,
  domains: ["*"],
  onConfigurationChanged: onConfigurationChanged
};