Bulk Upload

Supported formats

The Upload page is designed to simultaneously handle multiple submissions imported from an SD file according to configurable system settings.

The following molecule formats are supported: SDF, SMILES, SMARTS, MRV, CSV.

In the case of SDF files the structure field is mandatory. Even if the structure part contains an empty structure it should be still mapped, otherwise, the upload will fail.

The structure within the CSV file can be in ChemAxon Extended SMARTS (1), SMARTS (2), ChemAxon Extended SMILES (3), or SMILES (4) format.

images/download/attachments/1803272/csv.png

In the CSV file, the header is mandatory, columns can be comma or semicolon separated. The first column must be the the structure column followed by any other fields.

The structure column can contain a valid chemical file format as listed above for registering Single structure, or can be left empty for registering No-structures. At least one field (e.g. Id or additional data) must be mapped prior to initiating the upload process or the upload will fail.

Note that in case of CSV files the system might not automatically recognize the fields, so manual mapping should be made in order to successfully upload the file.

From version 20.8.0-2005111440 multi-value input delimiter within uploaded files can be defined. Field value splitting is supported for both SDF and CSV.
If CSV is chosen, CSV delimiter and multi-value delimiter must be different and CSV delimiter must not be contained within multi-value delimiter.
The delimiter is chosen for the uploaded file and not per field.
‘Enable multi-value input’ must be true for the given field in order to attempt to split field values.

You can define the delimiter in the Upload options. You can reach it by clicking to the gear wheel icon on the Upload page.

images/download/attachments/1803272/gear_wheel.png

images/download/attachments/1803272/define_delimeter.png

images/download/attachments/1803272/delimeter_mapping.png

Upload options icon

Define delimiter

Delimiter mapping

Upload page

The Upload page consists of two main sections, the File uploader and the All uploads.

images/download/attachments/1803272/image2018-10-26_10-41-8.png

The Upload page with the recent uploads

An SD file can be dragged and dropped into the File uploader, but you can also browse for files or paste as texts.

After the file to be registered has been selected, the user is allowed to navigate with the aid of arrows between the first records, while the accompanying data fields are displayed. The available data fields are determined based on the first ≈250 records. If more than 250 records are present, the [Scan more] button will be available. Each click on the [Scan more] button will process 250 more structures. If no more structures are available for examination, the button is not displayed.

While the Scan more button is active the user is always notified about the number of compounds processed from the SDF

Structure matching

Compound Registration's structure matching logic is based on ChemAxon's JChem Stereochemistry. During structure registrations the JChem Global Stereo model is used.

Field mapping

From version 20.8.0-2005111440 during bulk upload you can only map fields that are configured for the source you want to upload with. Fields with validator: ‘required’ are also considered only if they are on the form of the given source. Configuration can be done on the Administration/Forms and Fields/Form Editor page.

When uploading compounds, the data accompanying the structures can be e.g. LnbRef, Lot ID, PCN, CN, CST, Submitter, Salt ID, Salt multiplicity, etc. While the external ID fields (according to the actual business rules, e.g. LnbRef, Lot ID) can be configured to be mandatory or not for the registration, the structure (that can be an empty structure) is usually necessary to be able to initiate a registration (exception "No structures" in case of CSV files). Each column/field of the SD file can be mapped to any of the mandatory and optional fields of the structure object. The mapping can be set manually by selecting the appropriate parameter from the drop-down list above each column. E.g. "user" from the SD file can be mapped with "Lot submitter" from the DB table. The mapping can be saved in a text file using the [Save mapping] button and can be used again for other bulk upload registrations using the [Load mapping] button. When a field exists but contains a value just for some structures, there is a possibility to set a default value for the empty cases. After mapping the fields and setting a default value for the empty cases, the existing values will not be lost, just a value for the empty ones will be set.

It is possible to set specific Ids (PCN and/or CN) for the registered compounds, even during a bulkload process. For this, you just need to map the PCN and/or CN fields to the corresponding columns of the SDF file.

images/download/attachments/1803272/upload1.png

Field mapping before uploading a file

Upload configuration

For a bulk upload process, the source can be set. According to the selected source, Checkers (and fixers) and Registration options become available, that will be applied for all the structures of the selected SD file.

images/download/attachments/1803272/upload2.png

Applying a structure checker and fixer. The original and normalized structures are displayed.

By clicking on the [Upload] button, the registration process will run according to the autoregistration process, with the only difference, that here the selected structure fixer options will be applied to the structure before the actual registration and the "normalized" strructure can be compared with the "original" one.

Two-step registration

The two-step registration option allows the user to pre-register the compounds to the staging area in order to check them before registration.

images/download/attachments/1803272/upload3.png

Two-step registration option on the Upload page

During the two-step bulk upload, the user is directed to the pre-registration summary page, where the report of the process can be seen. The submissions that would be registered fine, should end up in the Staging area and also appear in the summary page with "Ready for registration" status. Besides the submissions with "OK" status the failed submissions are also listed, similarly like in the case of a regular bulk upload process (Figure).

images/download/attachments/1803272/upload5.png

The bulk upload summary page

From the Upload page compounds and salts/solvates can be bulk registered: