Design Hub developer guide - import plugins

    Import plugins can be used to push or pull data - compounds with meta data - into Design Hub from any external source (message queue, REST API, etc...)

    NodeJS module API

    An import plugin exports the following properties:

    Name Type Required Description
    name string yes Unique identifier of the plugin, used by Design Hub for identification and internal communication. If multiple plugins use the same identifier, the last one to be loaded overrides the others.
    label string yes Human readable name of the plugin, used by Design Hub to display GUI elements related to this plugin: as menu entry in the menu to enable the plugin, as title of the panel displaying the results.
    domains array of strings yes List of domains where this plugin may be used, when authentication is enabled in Design Hub. Use * to allow any domain.
    init function yes this is a domain specific context, described later in this doc. This method can be used to initialize connections and/or start scheduled jobs for the plugin.
    (!) In a multi-domain setup, init will be called for each domain separately with its own context.
    getSettings async function no Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based on return value of getSettings(). Has higher precendence than settings.
    settings array of objects no Including this property indicates a manual import plugin which can be triggered from the UI and settings dialog will be displayed for the user based the exported settings. This property is ignored if getSettings is defined.
    runImport async function yes Triggered in case of manual import, and gets the configuration in this
    cannotProcess async function The import service calls this with the ids of records for which the processing was failed.
    onConfigurationChanged function no A callback function that Design Hub calls during initialization and whenever an administrator updates the Secrets of the system.

    Arguments:
    config (Object) An object with secrets attribute containing the key-value pairs of secrets from the Admin interface

    Domain specific import context properties:

    Name Type Description
    domain string Domain of this context
    logger object Context specific logger with the typical debug, info, warn, error logging methods
    schedule object Job scheduler
    storeRawData function(RawRecord[]) The plugin API provides this callback function to store the compounds immediately into the database. The number of stored records returned.

    RawRecord

    The storeRawData callback function accepts a list of RawRecord objects. The purpose of this object is to hold all information on a compound that may be inserted, or updated.

    The table below lists all the accepted attributes of such a record:

    Name Type Required Description
    external_id string Yes A unique import record identifier.
    substance_id string No Physical substance identifier of the compound.
    virtual_id string No A non-Design Hub generated, unique virtual identifier of the compound.
    source string Yes Chemical structure of the compound in any file format recognized by Chemaxon IO system.
    owner_username string Yes username or identifier as obtained from the identity provider
    generate_virtual_id boolean No Deprecated
    project_id number One of project_id and project_key required Design Hub internal project identifier
    project_key string One of project_id and project_key required Design Hub external project identifier (acquired when fetching projects from company plugin)
    hypothesis_title string No title of a hypothesis in which compounds is stored
    designset_title string No title of a design set in which compound is stored
    status_id number No Design Hub internal status identifier. One of status_id or status_label required when visibility is set to 1.
    status_label string No Status label. One of status_id or status_label required when visibility is set to 1.
    source_system string No Label of the compound source. This attribute will be published for storage plugins.
    visibility number No Private/shared visibility flag. For private use 0, for public use 1. Default is 1
    raw_data Object No Compound properties (predicted or experimental data). Object keys matching the name of compoundFields will be extracted and used to update the value of "Additional Fields", while the rest are stored as "Imported data". All previously imported data is replaced for this import source.
    add_tags string[] No Tags to be added to the compound records.
    allow_user_provisioning boolean No When set to false, then unknown owner_username records will fail. When set to true, then records with unknown owner_username values will succeed and create new deactivated user accounts.

    Compound Properties

    The raw_data attribute of a RawRecord is simple object that accepts string, number and modified number values. See the example below:

    {
       "external_id": "CHEMBL25",
       "substance_id": "CHEMBL25",
       ...
       "raw_data": {
          "Toxicity Assessment": "Safe",
          "Purity %": 99,
          "COX-1 IC50 uM": {
             "value": 4.45,
             "modifier": ">"
          }
       }
    }

    For values with modifiers, the following value modifiers are accepted: <<, <, <=, =, *, ~, >=, >, >>.

    Import sources

    The application organizes calculated and imported data under unique keys with the following attributes:

    • the type of source: realtime plugins (GUI > Spreadsheet views > Data drawer > Add); imported (NodeJS import plugins and the REST API endpoint /api/import/rawdata); and user uploaded (GUI > New > Upload from file...)
    • the name of the source: name attribute of plugins; "rest-api"; and user file upload
    • the serialized form of settings used to obtain the data: settings value for realtime and import plugins; none for others
    • column label provided

    Developers should keep in mind that for updating import requests, all previously provided raw_data for this source (i.e. type + name + settings) will be replaced.

    Input processing

    The following general processing steps are taken on each input record:

    1. based on source, a static molecule image is generated
    2. based on source, an MRV formatted molecule representation is created
    3. if status_label was provided, resolve and confirm its status_id exists
    4. if project_key was provided, resolve and confirm its project_id exists
    5. identify compound author based on owner_username
    6. if hypothesis_title and designset_title were provided, check or create the target hypothesis and design set.
      • for existing hypotheses and design sets, the owner must have write permission
    7. identify matching records in the content database based on virtual_id or substance_id
      • the complete matching strategy fast tracks updates using previously used external_id values
      • based on the provided project_id, new records are inserted, but all matching records with the virtual_id or substance_id have their attributes updated to ensure data consistency
    8. insert or update content
    9. enable chemical search on the compounds

    If any of these steps fails, the record will be marked as FAILED and error cause is stored.

    Examples

    Load generated, private compound:

    [{
      "external_id": "gen-ai-modelversion-executiondate-output1",
      "source": "c1ccccc1",
      "owner_username": "id@company.com",
      "project_key: "P1",
      "hypothesis_title": "Binding hypothesis X",
      "designset_title": "Route 1",
      "visibility": 0,
      "raw_data": {
        "Confidence": 0.93
      }
    }]

    Creates new private compound with complete project, hypothesis, design set grouping. Repeated submission of the same RawRecord gets deduplicated based on the external_id, therefore no duplicate records will be inserted, however data updates are possible.

    Load externally designed, shared compounds:

    [{
      "external_id": "DC000001",
      "project_key: "P1",
      "virtual_id": "DC000001",
      "source": "c1ccccc1",
      "status_label": "Draft",
      "owner_username": "id@company.com",
      "raw_data": {
        "Predicted pIC50": 5.2
      }
    }]

    Creates a new shared compound in Project "P1" with a virtual ID given. On the assumption that DC000001 is globally unique, this value works as a valid external_id.

    Synchronize experimental data on real compounds:

    [{
      "external_id": "CXN008",
      "project_key: "P2",
      "substance_id": "CXN008",
      "source": "c1ccccc1",
      "status_label": "Synthesis Completed",
      "owner_username": "id@company.com",
      "raw_data": {
        "Assay Vendor Target IC50 (nM)": {
          "value": 5432,
          "modifier": "="
      }
    }]

    Adds CXN008 to project "P2". Since the source and assay data relate to asset CXN008, its best external_id is also CXN008. This way, if this compound is relevant to multiple projects (e.g. P1), all copies would be updated and this record makes sure a copy exists in P2 as well.

    For further combinations, feel free to reach out to technical support with your use-case and available input data.

    Performance tuning

    All ingested RawRecord objects are stored in the database and are processed. For manually initiatied imports (GUI, REST API), processing is immediate. For automatic imports, imports are scheduled tasks. After successful processing, the content appears for users.

    Theprocessing step can be controlled using the following options. For further details, see the Configuration Guide.

    • importProcessBatchSize
    • importSchedulerPlan

    Plugin skeleton

    Below, you can find 2 skeleton files for a manual and an automatic import plugin implementing the API methods. The code below includes typescript definitions for all parameters and expected results, so that editors like Visual Studio Code can assist with static code analysis and adherence to the specifications.

    skeleton-manual.import.js

    //@ts-check
    "use strict";
    
    const dhutils = require("@chemaxon/dh-utils");
    
    /**
     *
     * @typedef {Object} RawRecord
     * @prop {string} external_id
     * @prop {string} [substance_id]
     * @prop {string} [virtual_id]
     * @prop {string} source - chemical structure
     * @prop {string} owner_username
     * @prop {number} [project_id] Internal DH project identifer
     * @prop {string} [project_key]
     * @prop {number} [status_id] Internal DH status identifier
     * @prop {string} [status_label]
     * @prop {string[]} [add_tags]
     * @prop {string} [hypothesis_title]
     * @prop {string} [designset_title]
     * @prop {number} [visibility]
     * @prop {string} [source_system]
     * @prop {boolean} [allow_user_provisioning]
     * @prop {{[key: string]: string|number|number[]|NumberWithModifier}} raw_data - compound properties (assay data)
     * @deprecated @prop {boolean} [generate_virtual_id]
     *
     * @typedef {Object} NumberWithModifier
     * @prop {number} value
     * @prop {string} modifier
     *
     * @typedef PluginSettings
     * @prop {string} label
     * @prop {'boolean'|'number'|'enum'|'multienum'|'project'|'text'|'objectenum'|'objectmultienum'} type
     * @prop {string[]|number[]|{id: string, label: string, category?: string}[]} [values]
     * @prop {string|number|boolean} [default]
     * @prop {number} [min]
     * @prop {number} [max]
     *
     * @typedef ImportInitContext
     * @prop {string} domain
     * @prop {Logger} logger
     * @prop {import("node-schedule").schedule} schedule
     *
     * @typedef Logger
     * @prop {function(...any): void} info
     * @prop {function(...any): void} warn
     * @prop {function(...any): void} error
     *
     * @typedef {function(RawRecord[]): Promise<number>} StoreCallback
     *
     * @typedef GetSettingsContext
     * @prop {User} user
     *
     * @typedef RunImportContext
     * @prop {User} user
     * @prop {PluginConfiguration} settings
     * @prop {string} domain
     * @prop {StoreCallback} storeRawData
     *
     * @typedef User
     * @prop {string} userName
     * @prop {any} tokens OIDC TokenSet
     *
     * @typedef {any} PluginConfiguration Project is DH internal project ID
     *
     * @typedef ConfigurationValues
     * @prop {{[key: string]: string}} secrets
     */
    
    /**
     * @this {ImportInitContext}
     */
    function init() {
      //store the logger instance
    }
    
    /**
     * @this {GetSettingsContext}
     * @returns {Promise<PluginSettings[]>}
     */
    async function getSettings() {
      console.log("plugin-name getSettings", this.user);
      return [];
    }
    
    /**
     * @this {RunImportContext}
     * @returns {Promise<{ successCount: number }>}
     */
    async function runImport() {
      console.log("user is requesting data with settings", this.user, this.settings);
    
      //obtain data
      //transform data to records
    
      //submit records to DH API
      const successCount = await this.storeRawData(records);
    
      return { successCount };
    
    }
    
    /**
     * @this {CannotProcessContext}
     * @param {string[]} externalIds
     */
    async function cannotProcess(externalIds) {
      console.log("Cannot import IDs", externalIds);
    }
    
    /**
     * Store and use values provided by Admin interface's Secret manager
     * @param {ConfigurationValues} config
     */
    function onConfigurationChanged(config) {
      console.log("plugin-name configuration", config.secrets);
    }
    
    module.exports = {
      name: "manual-plugin-name",
      label: "Plugin Label",
      init: init,
      runImport: runImport,
      getSettings: getSettings,
      cannotProcess: cannotProcess,
      domains: ["*"],
      onConfigurationChanged: onConfigurationChanged
    };

    skeleton-automatic.import.js

    //@ts-check
    "use strict";
    
    const dhutils = require("@chemaxon/dh-utils");
    
    /**
     *
     * @typedef {Object} RawRecord
     * @prop {string} external_id
     * @prop {string} [substance_id]
     * @prop {string} [virtual_id]
     * @prop {string} source - chemical structure
     * @prop {string} owner_username
     * @prop {number} [project_id] Internal DH project identifer
     * @prop {string} [project_key]
     * @prop {number} [status_id] Internal DH status identifier
     * @prop {string} [status_label]
     * @prop {string[]} [add_tags]
     * @prop {string} [hypothesis_title]
     * @prop {string} [designset_title]
     * @prop {number} [visibility]
     * @prop {string} [source_system]
     * @prop {{[key: string]: string|number|number[]|NumberWithModifier}} raw_data - compound properties (assay data)
     * @deprecated @prop {boolean} [generate_virtual_id]
     *
     * @typedef {Object} NumberWithModifier
     * @prop {number} value
     * @prop {string} modifier
     *
     * @typedef PluginSettings
     * @prop {string} label
     * @prop {'boolean'|'number'|'enum'|'multienum'|'project'|'text'|'objectenum'|'objectmultienum'} type
     * @prop {string[]|number[]|{id: string, label: string, category?: string}[]} [values]
     * @prop {string|number|boolean} [default]
     * @prop {number} [min]
     * @prop {number} [max]
     *
     * @typedef ImportInitContext
     * @prop {string} domain
     * @prop {Logger} logger
     * @prop {import("node-schedule").schedule} schedule
     * @prop {StoreCallback} storeRawData
     *
     * @typedef Logger
     * @prop {function(...any): void} info
     * @prop {function(...any): void} warn
     * @prop {function(...any): void} error
     *
     * @typedef {function(RawRecord[]): Promise<number>} StoreCallback
     *
     * @typedef GetSettingsContext
     * @prop {User} user
     *
     * @typedef RunImportContext
     * @prop {User} user
     * @prop {PluginConfiguration} settings
     * @prop {string} domain
     * @prop {StoreCallback} storeRawData
     *
     * @typedef User
     * @prop {string} userName
     * @prop {any} tokens OIDC TokenSet
     *
     * @typedef {any} PluginConfiguration Project is DH internal project ID
     *
     * @typedef ConfigurationValues
     * @prop {{[key: string]: string}} secrets
     */
    
    /**
     * @this {ImportInitContext}
     */
    function init() {
      //store the logger instance
    
      //set up the cron job
      const job = this.schedule.scheduleJob("0 0,30 * * * *", runImport.bind(this));
    }
    
    /**
     * @this {RunImportContext}
     * @returns {Promise<{ successCount: number }>}
     */
    async function runImport() {
      //obtain data
      //transform data to records
    
      //submit records to DH API
      const successCount = await this.storeRawData(records);
    
      return { successCount };
    
    }
    
    /**
     * @this {CannotProcessContext}
     * @param {string[]} externalIds
     */
    async function cannotProcess(externalIds) {
      console.log("Cannot import IDs", externalIds);
    }
    
    /**
     * Store and use values provided by Admin interface's Secret manager
     * @param {ConfigurationValues} config
     */
    function onConfigurationChanged(config) {
      console.log("plugin-name configuration", config.secrets);
    }
    
    module.exports = {
      name: "automatic-plugin-name",
      label: "Plugin Label",
      init: init,
      runImport: runImport,
      cannotProcess: cannotProcess,
      domains: ["*"],
      onConfigurationChanged: onConfigurationChanged
    };