Merging Data

Merging data is a powerful feature, but it risks overwriting your data inappropriately if not used with some prior consideration. It is therefore strongly recommended that you read and understand this section before using the merge feature. Also, it is recommended that you back up your data (e.g. export it to a file so that it can be re-imported) particularly if the data cannot be easily recreated.

When you select a field for merging, you are specifying that the field in the file and the field in the database table contain common values, and the data for each row in the file will be used to update the corresponding row in the database (rather than adding it as a new row as with map). Any fields that are specified as new fields will be added as usual, but any mapped fields will have their values in the database overwritten with any value found in the file. Ideally the choice of 'ID' field will be a set of unique values or primary keys from the data source which are also found in the target entity but if not then further consideration is required.

Example: You have a SD file containing structures and an ID field, and a data file (e.g. a CSV file exported from Excel) containing additional data for those structures, and one of the fields in this file contains the same ID values for the structure. To merge this sort of data you would use the following procedure:

  1. Import the SD file into a new JChem table as usual.

  2. Start to import the CSV file into the same table (see Importing data for details).

  3. When you get to Step 3, Field Details, remove the default mapping (indicated by single arrow) of the two fields that contain the common field 'ID' by removing the field from the list of new fields using the Remove button (by default all file fields are added as new fields). The field will become enabled in the list of file fields and can be added back to the list of database fields.

  4. Specify that this field is to be used as a merge field by selecting the field in the list of file fields and the field in the database that it will be merged with from the right hand list. With both fields selected click the 'Merge' button. Merge fields are signified by the double-headed arrow symbol.

  5. Add all other fields that you require from the CSV file as new fields. If you specify any fields as mapped fields then the data in the file will overwrite the values already in the database.

  6. Continue the import as usual.

Important points to remember about merging:

  • Data in your database is irreversibly overwritten.

  • Only certain field types can be used for merging data (text and integer).

  • If the 'ID' field of the database is unique it most suitable candidate suitable for merge.

  • If an 'ID' value from the file is not found in the database then that entry will not be imported.

  • The order of the duplicate values in the 'ID' field is important if your merge field is not a primary key.

  • If the database contains multiple rows with a given 'ID', then all records with duplicate value in 'ID' field in the entity will be updated with the values of the last value for 'ID' encountered in the file processed.

  • Multiple merge fields can be specified, in which case all IDs must match (composite primary key).

  • The previous two features are not expected to be used commonly but do provide a very powerful capability to selectively update data - Use with caution.

  • Consider: Every standard entity primary key is always named 'ID'. You may be tempted to merge using this key. You should not and rather consider use of a field (ideally primary key) obtained from the data sources.

Current limitations on merging data:

  • When merging data the structure field cannot be mapped.

  • The structure field cannot be used as a merge field.

  • Merging is slow if you have lots of data and no RDBMS indexes applied. Adding a (unique) index to the merge field's column before importing is a highly recommended step. This index should be added prior to merge and can be done directly in IJC please see screen shots below or alternatively directly in your chosen RDBMS.

images/download/attachments/1805427/add-unq-index.png

Options

There are 3 options for merging data:

  • Merge - It identify existing IDs via the merge filed and if same, it overwrite existing data accordingly

  • Merge and Add new rows - It identify existing IDs via the merge filed and if same, it overwrite existing data accordingly and append any new IDs.

  • Add new rows only - It identify existing IDs via the merge filed, skips those which are identical and append any new ID's.

images/download/attachments/1805427/ijc.png