Upload compounds

/*<![CDATA[*/ div.rbtoc1600691225630 {padding: 0px;} div.rbtoc1600691225630 ul {list-style: disc;margin-left: 0px;} div.rbtoc1600691225630 li {margin-left: 0px;padding-left: 0px;} /*]]>*/

Uploading compounds with salt/solvates

  • Uploading an SDF that contains compound and salt/solvate structures

Stripping salts/solvates from compounds is performed when the "Analyze Salt Solvate Fragments" Source dependent Registration option is ON. Only salts/solvates that are present in the Salts &Solvates Dictionary can be stripped.

  • Uploading an SDF that contains compounds and salts Ids

During the upload, the user can provide the salt/solvate Ids and multiplicities for each compound structures if needed. Salts/solvates from the Salts &Solvates Dictionary can be referred by their Id and multiplicity in the SDF within a data field that is going to be mapped with the "Version salt/solvate ID" and "Version salt/solvate multiplicity" fields ( Figure )

The result of the upload process is always summarized in the:

Uploading multi-component compounds

Please note, that the Bulk Upload does NOT support the registration of multi-component compounds. In the case when the loaded SDF file contains multiple components within one structure, this will be registered as a "single" compound consisting of more components (not as a Mixture, Formulation or Alternate). Or, as a solution, a multi-component checker can be introduced, in which case these records will not be autoregistered but will fall to the Staging area, from where these can be manually registered as multi-component compounds.

Uploading Invalid structures

Unreadable file format: When a molfile cannot be parsed and no information can be extracted from it (because it might be corrupted or does not follow the expected file format specification).

Inconsistent file format: Even though a valid molfile (can be parsed according to syntax specification, passes format validation) is present, it can happen that the molecule described by the file does not make any sense and is likely wrong.

Unsupported file format: ChemAxon may not know that a section of the file contains structure-relevant information and simply skip parsing that information. Often caused by modifications to the file format specifications introduced by vendors or organizations, that are not widely used and not sufficiently described publicly. In other words: we might miss extracting data, that is not supposed to exist according to the publicly available file format specs and we cannot assume what type of data and what its context might be.

Using Compound Registration invalid structures can be uploaded successfully. The compounds will be stored as "No structures" while the original structure as the text will be stored as additional data. Later the "No structures" can be amended on the parent compound level and the correct structure can be provided (the Ids will be kept).

In order to prepare the system for storing the original (invalid) molfile as a text, an additional data field should be created (Administration/Forms and Fields) that has the "errorStructure" as field identifier.

images/download/attachments/9241676/upload11.png

images/download/attachments/9241676/upload12.png

New field configuration

On the Upload page the previously created additional data field can be appended:

images/download/thumbnails/9241676/upload13.png

images/download/thumbnails/9241676/upload14.png

Appending an additional field

After uploading the file, the records that have invalid structures will be converted to "No structures" and will have the structure as text in a field stored as additional data to the compound.

images/download/attachments/9241676/upload15.png

During registration on the Staging area/Submission page

images/download/attachments/9241676/upload16.png

After registration: Browse page

The result of the upload process is always summarized: