Bulk Upload

    Upload is the process of calling the registration service to register a set of compounds automatically based on a predefined configurable set of business rules (validation, standardization, structure checking).

    {info} Similar to the advanced registration process source-based checkers and fixers are not applied by default during the bulk registration, but need to be enabled manually.The Quality Checks defined at the system level will always run.

    Supported formats

    The Upload page is designed to simultaneously handle multiple submissions imported from an SD file according to configurable system settings.

    The following molecule formats are supported: SDF, SMILES, SMARTS, MRV, CSV.

    {info} In the case of SDF files the structure field is mandatory. Even if the structure part contains an empty structure it should be still mapped, otherwise, the upload will fail.

    The structure within the CSV file can be in ChemAxon Extended SMARTS (1), SMARTS (2), ChemAxon Extended SMILES (3), or SMILES (4) format.

    images/download/attachments/1803272/csv.png

    {info} In the CSV file, the header is mandatory, columns can be comma or semicolon separated. The first column must be the the structure column followed by any other fields.

    {info} The structure column can contain a valid chemical file format as listed above for registering Single structure, or can be left empty for registering No-structures. At least one field (e.g. Id or additional data) must be mapped prior to initiating the upload process or the upload will fail.

    {primary} Note that in case of CSV files the system might not automatically recognize the fields, so manual mapping should be made in order to successfully upload the file.

    {info} From version 20.8.0 multi-value input delimiter within uploaded files can be defined. Field value splitting is supported for both SDF and CSV. If CSV is chosen, CSV delimiter and multi-value delimiter must be different and CSV delimiter must not be contained within multi-value delimiter. The delimiter is chosen for the uploaded file and not per field. ‘Enable multi-value input’ must be true for the given field in order to attempt to split field values. You can define the delimiter in the Upload options. You can reach it by clicking to the gear wheel icon on the Upload page.

    images/download/attachments/1803272/gear_wheel.png images/download/attachments/1803272/define_delimeter.png images/download/attachments/1803272/delimeter_mapping.png
    Upload options icon Defining a delimiter Delimiter mapping

    ID-based fields

    Since version 21.3.0 ‘File contains dictionary item IDs, instead of values’ can be used. You can reach this in the Upload options, by clicking to the gear wheel icon on the Upload page.

    images/download/attachments/1803272/idBased_toggle.png
    'File contains dictionary item IDs, instead of values' checkbox

    The effect of the 'File contains dictionary item IDs, instead of values' checkbox:

    • ID-based fields:

      When this checkbox is ON: File contains dictionary item IDs, instead of values.

      When this checkbox is OFF: File contains values.

    • Non-ID-based fields:

      Non-ID-based fields are not affected.

    When you use the Append field feature and choose a Value from the drop-down, that id-value pair will be used. You will see the Value on the UI, also when the checkbox is ON or OFF.

    Upload page

    The Upload page consists of two main sections, the File uploader and the All uploads.

    images/download/attachments/1803272/image2018-10-26_10-41-8.png
    The Upload page with the recent uploads

    An SD file can be dragged and dropped into the File uploader, but you can also browse for files or paste as texts.

    After the file to be registered has been selected, the user is allowed to navigate with the aid of arrows between the first records, while the accompanying data fields are displayed. The available data fields are determined based on the first ≈250 records. If more than 250 records are present, the [Scan more] button will be available. Each click on the [Scan more] button will process 250 more structures. If no more structures are available for examination, the button is not displayed.

    {info} While the Scan more button is active the user is always notified about the number of compounds processed from the SDF

    {info} Structure matching Compound Registration's structure matching logic is based on ChemAxon's JChem Stereochemistry. During structure registrations the JChem Global Stereo model is used.

    Field mapping

    {info} Since version 20.8.0 during bulk upload you can only map fields that are configured for the source you want to upload with. Fields with validator: ‘required’ are also considered only if they are on the form of the given source. Configuration can be done on the Administration/Forms and Fields/Form Editor page.

    {info} Since version 20.19.0 auto-mapping of fields during a bulk upload tries matching first to the field identifier of currently available fields, and if not found will perform a secondary matching to field display names. Field identifiers are displayed either next to field labels or as a tooltip when hovering over the respective field label in all relevant places on the bulk upload page.

    images/download/attachments/1803272/Upload_dropdown_ids.png images/download/attachments/1803272/Upload_id_tooltip.png
    Select mapping drop-down with ids Id tooltip

    Since version 21.3.0 for select fields the default value list is a drop-down, showing eligible entries for the given field.

    images/download/attachments/1803272/id_based_dropdown.png
    Select field drop-down

    When uploading compounds, the data accompanying the structures can be e.g. LnbRef, Lot ID, PCN, CN, CST, Submitter, Salt ID, Salt multiplicity, etc. While the external ID fields (according to the actual business rules, e.g. LnbRef, Lot ID) can be configured to be mandatory or not for the registration, the structure (that can be an empty structure) is usually necessary to be able to initiate a registration (exception "No structures" in case of CSV files). Each column/field of the SD file can be mapped to any of the mandatory and optional fields of the structure object. The mapping can be set manually by selecting the appropriate parameter from the drop-down list above each column. E.g. "user" from the SD file can be mapped with "Lot submitter" from the DB table. The mapping can be saved in a text file using the [Save mapping] button and can be used again for other bulk upload registrations using the [Load mapping] button. When a field exists but contains a value just for some structures, there is a possibility to set a default value for the empty cases. After mapping the fields and setting a default value for the empty cases, the existing values will not be lost, just a value for the empty ones will be set.

    It is possible to set specific Ids (PCN and/or CN) for the registered compounds, even during a bulkload process. For this, you just need to map the PCN and/or CN fields to the corresponding columns of the SDF file.

    images/download/attachments/1803272/upload1.png
    Field mapping before uploading a file

    Upload configuration

    For a bulk upload process, the source can be set. According to the selected source, Checkers (and fixers) and Registration options become available, that will be applied for all the structures of the selected SD file.

    images/download/attachments/1803272/upload2.png
    Applying a structure checker and fixer. The original and normalized structures are displayed.

    By clicking on the [Upload] button, the registration process will run according to the autoregistration process, with the only difference, that here the selected structure fixer options will be applied to the structure before the actual registration and the "normalized" strructure can be compared with the "original" one.

    Two-step registration

    The two-step registration option allows the user to pre-register the compounds to the staging area in order to check them before registration.

    images/download/attachments/1803272/upload3.png
    Two-step registration option on the Upload page

    During the two-step bulk upload, the user is directed to the pre-registration summary page, where the report of the process can be seen. The submissions that would be registered fine, should end up in the Staging area and also appear in the summary page with "Ready for registration" status. Besides the submissions with "OK" status the failed submissions are also listed, similarly like in the case of a regular bulk upload process (Figure).

    images/download/attachments/1803272/upload5.png
    The bulk upload summary page

    From the Upload page compounds and salts/solvates can be bulk registered: