Page tree

Exporting Data

Data from database tables can be exported from IJC to various common file formats. The export process allows you to specify which Fields in the table are to be exported, as well as which file format the data is be written to. A number of file formats are currently supported, including:

  • Delineated text files (*.csv, *.tab)
  • MDL SD files (*.sdf)
  • MDL RD files (*.rdf)
  • ChemAxon's Marvin format (*.mrv)
  • Smiles and smarts strings (*.smiles, *.cxsmiles, *.smarts, *.cssmarts)
  • IUPAC InChI (*.inchi)
  • ChemAxon JChem for Excel (*.xlsx)

Since the 15.10.5.0 release the user has to have the ROLE_EXPORT_DATA assigned to be able to control all types of export actions. This allows the administrator to improve the database security. The user roles are described here.

The data that is exported corresponds to the data in the chosen Data Tree, and as such reflects the current query that is applied to that Data Tree, as indicated in the Query window. This allows you to export a subset of the data. If you want to export the entire data then remember to reset the query to show all rows (i.e. click the Show All button in the Query builder window).
Using the List management features can be a useful way of collecting a set of results ready for export.
As all Views belonging to a Data Tree share the same current query and data, exporting from the Data Tree or one if its Views is effectively the same.

To export data:

Right-click a Data Tree node (      ) or a view node (e.g. Grid View node (  ), or Form View node (  ) in the Projects window and choose Export to file... from the menu. Alternatively, when working in the Grid View or Form View choose Export to file... from the File menu, or click the export icon ) in the main toolbar. The Export to File wizard opens to guide you through the process.

Step 1. Specify Details:

  • Open the file selector to specify the file name and format of the file that you want to create. The format you select will affect the options you see once you close the file selector.
  • The available export options will be updated, dependent on the chosen file format (see below).
  • Click 'Next'.

    About the export options

The export options differ considerably depending on the file format that is selected.

Structure only formats e.g. smiles (and the various variants), InChi

For these formats you are only exporting the structure field so you only see a limited set of options relating to how the structure is to be exported. An example screenshot is shown below for smiles export.

In this you see:

  • A selector for the structure field. There will only be one field available: IJC currently only supports a single structure field per table but this is likely to change in future.
  • An option to specify a format string for the structure export. For specific details consult the File Formats in Marvin documentation.

Structure plus field formats e.g. SDF, MRV, RDF

For these formats you can select fields that you want to be included in the export. By default, all fields within the root database table will be included. If you want to remove any fields, choose the fields under the heading 'Selected fields', and click 'Remove'. Alternatively, if you want to add any fields under the 'Available fields' heading, select the field and click Add. The order of the fields can also be adjusted using the 'Move Down' and 'Move Up' buttons.

The screen shot below shows the options when exporting to SDF format. There are also options for specifying the characterset and the new line character.

When exporting a complex data tree containing relational data you will have the option to export data from the detail table in addition to those from the root table. By default only fields from the root table are selected, but you can choose to also include fields from the detail tables as well. See the section below for details on how the detail fields will be exported.

Text file formats e.g. CSV, Text, OS specific line end chars

Text file formats such as comma separated and tab separated text files allow you to export data in a form that can be imported into commonly programs such as Excel or Spotfire. Structures can be included in the export, but the possible formats are more restricted as they need to be compatible with the single line nature of these file formats. The default format is smiles, but this can be specified using a format string. We can see example of smiles:-H as an option for Format string. Final export will have molecules without hydrognes. In the same way you can modify your SDF and MRV formats. Consult the Molecule file conversion with Molconverter for specific details.
Details of how to export are very similar to the multi-line structure formats described earlier. One additional option is the ability to specify whether to include the field names as the first line of the output file.


When Os specific line end chars  isn't checked, molexporter will finish end lines with signe "\n" (Mol Exporter). Checked box returns the system-dependent line separator string. It always returns the same value - the initial value of the system property line.separator. On UNIX systems, it returns "\n"; on Microsoft Windows systems it returns "\r\n" (Line Separator).

Sdf exporter option works in the similar way as Format string

Excel (.xlsx) format

This format exports your structural data immediately ready to be open using JChem4Excel and now you can export, more than the old limit of 65K records and Excel will open the file successfully! It's also possible to export structures as images or in other text formats, where faster export is the advantage. Text formats also have direct support for import into JChem4Excel. If you experience problems, with filtering or resizing images of structures, you need to set the property in Excel.

All supported structure formats are shown below. You can specify the image file format and dimensions. You can specify format options as defined in Marvin documentation. In case of Relational Export, if One-To-Many relationship is present, a checkbox will appear. This checkbox controls the behaviour of the export, whether to repeat parent's entity row for each one of child entity's rows. As a real life application, checking the checkbox will produce data more usable for further computing, while leaving it unchecked will produce data more suitable for viewing and printing.

Export templates

The export template can be also stored as an .xml file for further use and / or sharing with other users. To save the export configuration, you can press the 'Save configuration' button and save the .xml export definition file. Similarly, this file can be loaded into Instant JChem using the 'Load configuration' button.

Step 2. Monitor Progress:

  • A progress monitor is displayed and a report of the process is output in the main window of the dialog.
  • Click Finish.

There are 2 checkbox options that will tell, what to do, after the export finishes. Option "Open the file" will try to open the exported file by a program that is set as system's default for the given format. If there is no associated application to the file format, IJC will inform the user about that.

The other checkbox's function is to control, whether to show the exported file in the default file browser.

Exporting Relational Data

Complex relational data structures do not readily export to flat file formats. IJC supports One-to-One, One-to-Many and Many-to-One relationships.

Many-to-Many is not supported for relational export to flat file formats.

One-to-Many relationship

In case of xlsx export, it is additionally possible to control the way, how the output will look like with a checkbox mentioned above. Please notice, that if the checkbox is checked, the root entity's record will be repeated for each record in the child entity.

Without checkbox applied 
Root A recordChild A Record 1
 Child A Record 2
 Child A Record 3
Root B recordChild B Record 1
With checkbox applied 
Root A recordChild A Record 1
Root A recordChild A Record 2
Root A recordChild A Record 3
Root B recordChild B Record 1

 

In case of export to text files, such as sdf, tab delimited, or CSV, the export behaves as if the checkbox would be checked.

Many-to-One relationship

In case of Many-to-One relationship, the export will always export in the same way as mentioned above, for case that the checkbox is applied:

Child A Record 1Root A record
Child A Record 2Root A record
Child B Record 1Root B record
Child B Record 2Root B record

One-to-One relationship

In case of this relationship, the export will behave as expected from the name of the relationship. 

Exporting data with multiple relationships

Lets have an example relationship

A -> B <-> C -> D

and we export fields only from entities A, C and D, while omitting entity B. Normally, expected behaviour would be that only entity A will be exported, as entities C and D are deeper in the hierarchy and can not be exported without entity B.

This however is not the case. If any data from entity needed for the relationship further in the chain is not selected by the user, IJC does this for the user and caches the relevant relationship information internally, in the background. Afterwards, the export to the file is performed only for the defined fields in the export wizard and the export is created according to the rules described above.

Exporting to Local Database

There was a new functionality added in IJC 5.12 that allows exporting data from Oracle or MySQL databases to a local Derby database. The data to be exported is defined by a datatree and can either be all data in that datatree or a search results. Alongside with the data there is also migrated all IJC metadata that is required for viewing the migrated data in the target project. This means that the local Derby database will contain the same datatree, entity, field and view definitions as the exported datatree and the target entities will be populated with the exported data.

A simple example is a 'Benzene' substructure search in a Pubchem database in your Oracle or MySQL database. After hits are returned r-click on Pubchem demo datatree node in 'Projects' windows and choose 'Save Results As Local Database...' action.
It will open an IJC schema chooser dialog similar to the one shown on the picture below.
In the IJC schema chooser dialog you can select existing local database IJC schema or create a new project with an empty local database IJC schema as target. Select the IJC schema that you want to export your data to and press 'Save' button.
All molecules that were found by your substructure search will be exported to the target database. When the export finishes open the target IJC schema and navigate to the exported datatree.

Please note that there are several major limitations in this functionality:

  • JChem Cartridge entities cannot be exported. Only Standard or JChemBase entities are currently supported.
  • The target entities are readonly and new data cannot be added to them.
  • The migration works only in one direction from Oracle or MySQL database to Derby database, but not the other way around.
  • When repeatedly exporting data from the same datatree the data is merged in the target database. If you need the target project to contain exactly the data that are shown in the source project please either use a new empty target project or delete the datatree with all its database artefacts from the target project before export.