IJC tutorial: Using Standardizer to your advantage

    Overview

    This tutorial will explain why it is a valuable step to add a standardizer rule set to your structure based entities and how to apply a standardizer within the IJC desktop. Application of a standardizer or set of "Business rules" brings order and uniformity into the representation of the molecules in your structure based entities. They also serve as your internal query reference standard when searching for molecules or indeed extracting them for further processing. As such it is highly important to consider the available options and the effects of applying these standardizer rules in a deterministic order. In particular, the form of the "allowed" queries in relation to the database standardization is considered, when many forms are possible. As such, this tutorial acts as a simple guide to show how to set up an entity with the default standardizer applied and then shows a further example where by a fixed transformation is applied to the Nitro group. We use the standard PubChem molecules in the example. We could choose to use either the 2D or 3D sdf files available and in either case we use the first file in the list. We also use the local derby database for the purposes of demonstration, the Oracle / MySQL approaches are essentially similar using the Instant JChem interface.

    Files for this example can be found here: 2D or 3D

    Contents

    Create Project & Schema connection

    First create a new project container. Use File -> New Project... menu entry or appropriate icon in the toolbar (shortcut - Ctrl+Shift+N). Create a new project and choose IJC Project (with local database). Next name your project and select Finish.

    images/download/attachments/1802487/1_new_proj_local.png

    Create a new Data tree in the schema & Entities then import the data

    Next we should need to create a data tree with a structures table and there are two approaches to this. You can either right click on the schema node and a menu will appear. Then select New Data tree and structure entoty (table)... . Alternatively, you could complete the same result with operations in the schema editor with the same result. Create a structures entity in the entities tab by right clicking New Structure entity (table)... . Then at the data tree level, promote it using New Data tree from entity ... Additionally, you can create the entity directly at the data tree level by selecting New Data Tree and structure entity (table)... Using the preferred method, create a data tree with a root node named "PubChem2Da".

    images/download/attachments/1802487/2_1_new_datatree.pngimages/download/attachments/1802487/2_2_new_datatree2.png

    Finally, you can import data into each entity. This is completed at the entity level. In the entities tab, right click on the Structures entity "PubChem2Da" using the Import File Into X... and select the SDF file. Select Next (we will accept all suggested fields) then Next again and the import commences, finally select Finish once completed.

    images/download/thumbnails/1802487/2_3_import_menu.pngimages/download/attachments/1802487/2_4_import1.png

    Understanding the default Standardization

    The standardizer rules are applied for the entity on import but can also be re-applied as a later event. Since we have not yet explicitly defined or applied any standardizer rules yet we will be able to see the effects of the default standardization which is automatically applied. This is often referred to as "Aromatize and remove explicit Hydrogens". It is possible to apply a standardizer in create entity dialog as well as from the schema editor for an existing entity. Later we will complete both these routes with the addition of a Nitro functional group rule and examine the same end result.

    Once import is completed we can execute some queries within the entity to assist in understanding the default rules and how they impact on search and display. Open a grid view or create a new default form view with MolPanel and view all records. The first thing the astute user will notice is that the molecule's visible in the display appear to have not been standardized according to the default rules. In fact, the internal table stores both the original and standardized versions of the structures and there is a visual properties, display setting property for the widget.

    Right click on the MolPanel widget and select Customise Widget Settings . In the visual properties tab, tick the "Display as Standardized" tick box. You will now see the display looks like the expected default standardization rules. The same change needs to be applied separately to the grid view using the structure column. Below we can see an example record before and after standardization.

    images/download/attachments/1802487/3_1_customize_widget_menu.png images/download/attachments/1802487/3_2_customize_widget_wiz.png images/download/attachments/1802487/3_3_customize_widget_std.png

    It is useful now to understand how query might work with the default standardizer applied and irrespective of the display. First, lets search for the SS pyridine using both aromatized (c1ccncc1) and dearomatized (C1=CC=NC=C1) forms of the structure. The Kekule form of the SS query yields 1134 hits (total for first PubChem file is 23408 records). Next convert to aromatized and complete the same search ( Convert to Aromatic form ). The same hits set is found so this query is synonymous and interchangeable.

    images/download/attachments/1802487/4_1_query_aro.png

    Next, we try a primary ChloroAlkane ([H]C([H])Cl) which contains two explicit H defined in the query definition - 524 hits are visible. This shows us that even though not displayed, the explicit H are used in the search if defined in the query.

    images/download/attachments/1802487/4_2_query_chloro.png

    Next we search for the infamous Nitro group using the popular charged form ([O-]N=O) and find 1043 hits. Finally we search for the Nitro group using the debated neutral form (O=N=O) and find no hits returned.

    images/download/attachments/1802487/4_3_query_nitro1.png images/download/thumbnails/1802487/4_4_query_nitro2_nohits.png

    Since some organizations, wish to display and search using this form of Nitro group, then it is possible to configure standardizer rules as such so that the hits will be found synonymously with the charged form. The "Pentavalent Nitrogen" conundrum is discussed in some detail in these references but we leave it to each organisation's Scientific apparatus, to decide on there own "Business rules":

    • Journal of Molecular Structure, 300 (1993) 245-256. "On the â??pentavalentâ?? nitrogen atom and nitrogen pentacoordination": Richard D. Harcourt

    • J. Phys. Chem. A 2006, 110, 10507-10512. "Characteristics of Multiple N,O Bonds": Ian Love

    Establish a new Entity standardizer

    The default Standardization covers the basic expectations of the user. However careful consideration of the available transformations should be completed before building any real production system. Fortunately that is the hard part, the application of a any standardizer rules is straightforward in IJC.

    Apply standardizer on entity creation

    Next, we will create a new entity in order to show the application of standardizer rules at this stage. create a data tree with a root node named "PubChem2Db". In the Standardizer tab, create an associated Standardizer by pressing Create Standardizer . Add the "Nitro" action. Select Finish and the entity is created. Import the same SDF file once more and create a form view with a MolPanel.

    images/download/attachments/1802487/5_1_create_std.pngimages/download/attachments/1802487/5_2_std_add_nitro.png

    Again we search for the infamous Nitro group using the popular charged form ([O-]N=O) and find 1080 hits. Finally we search for the Nitro group using the debated neutral form (O=N=O) and find 1078 hits returned. Standalone charged nitro group ([O-]N=O) is not standardized causing that 2 hits are not found by the neutral nitro group (O=N=O) substructure query. Now the two forms of the query are synonymous with the "Nitro" rule applied.

    images/download/attachments/1802487/5_4_query_nitro_neutral.png

    Apply standardizer to existing entity

    Finally we apply the same standardization to the existing entity "PubChem2Da". Right click on the schema node and select Edit Schema then select the entity level tab. Select the entity "PubChem2Da" and then the Standardizer tab. Currently you will see that only the defaults are applied. Select the Create standardizer button. Now from the list of possible options on the left find "Nitro" and use the Add button to add it to the standardizer for the entity. Use the arrow buttons to place it at the top. Press the Apply button to regenerate the table with a new standardization applied.

    images/download/attachments/1802487/6_1_add_std_existing1.png images/download/attachments/1802487/6_2_add_std_existing2.png

    You should now find that "PubChem2Da" and "PubChem2Db" entities exhibit exactly the same behaviour with respect to either form of the Nitro group. We leave you to enjoy experimenting with the other transformations available. If you are unhappy about a particular transformation, remove it from the standardizer configuration and regenerate via 'apply', the original representation is always retained and hence you can revert to any new applied rules. In the screenshot below you can see a particular record which is displayed as standardized. Note the Nitro group is correctly depicted here according to the rule applied and importantly both forms of the query finds all possible results.

    images/download/attachments/1802487/7_comparison.png

    Establish a new Query standardizer

    The Query Standardizer provides an alternative method for standardizing your structures. Although the results from using either the Entity Standardizer or the Query Standardizer should be the same if set up equally, the methods of handling the structures are distinct. Entity structure standardizer operates statically. It ensures that the standardized structures visible in IJC structure entity remain consistent with the data in database. On the other hand, Query structure standardizer is employed only during query execution. Since Query standardizer is applied on the IJC level, it can be used even if, for some reason, like cartridge in use, Entity standardizer cannot. To configure your query standardizer, right-click on your schema and select Edit Schema. A new document list with your schema name will appear.

    images/download/attachments/1802487/querystand1.png

    From here, click on Schema and then navigate to Query structure standardizers.

    A query standardizer builder window will appear, where you can configure your standardizer.

    In order to select for which search type (e.g. Substructure, Superstructure, Similarity, etc.) the query standardizer will be established, open the list in the upper right corner and choose one.

    images/download/attachments/1802487/querystand4.png

    As can be seen in the picture, substructure search is set by default.

    Now add the standardizers of your interest. Once added, they will be displayed on the right-hand side.

    images/download/attachments/1802487/querystand2.png

    By selecting one of the standardizers on the right-hand side, you access a new option window that allows for further modifications.

    images/download/attachments/1802487/querystand3.png

    Once you are satisfied with your configuration, apply the changes.

    {warning} While Entity standardizer can be individually configured for each structure entity, Query standardizer is configured for the entire schema. As a result, if you use a query standardizer on an entity with a configured entity standardizer, be aware that discrepancies between the two standardizers' configurations may lead to undesirable conflicts and inaccurate outcome.

    Congratulations

    Congratulations! You have just applied a simple Standardizer example, by learning :

    • How to create project & schema.

    • How to create data tree (Structures) and import data.

    • Understand the default Standardizer rules.

    • Apply a Nitro standardization for a new entity and understand effect.

    • Apply a Nitro standardization for an existing entity and understand effect.