If you think your experimental data can improve the accuracy of the pKa calculation, you can take advantage of a supervised pKa learning method that is built into the pKa plugin. Special structural parts can have an effect on the pKa values calculated by the built-in method, so your correction library based on your experimental data can help the pKa plugin increase the prediction accuracy.
Inaccurately predicted ionization centers need to be identified and experimental data for them have to be collected in order to handle them. Since the learning algorithm is based on linear regression analysis, you need to collect as much experimental pKa data as possible to get enough correlation. There are no hard-and-fast rules about the amount of data to be applied. If your are to create a local model only for a certain type of ionization centers, then it may be enough to collect a few representative structures. A robust model, however, requires as many diverse structures and pKa values as possible.
The experimental data should be collected in an SD file. Then the training command has to be run in order to create a correction library. This will be stored on your local computer, in your user folder.
To create a training library a proper input file in SDF or MRV format should be prepared first. This file can be compiled using either Instant JChem or JChem for Excel.
The SD file should contain the following pieces of information:
Fig. 1 Input for training library generation
The training library can be created using the cxtrain command line tool from an input structural file:
cxtrain pka -i [library name] [training file]
Once the training library is generated, it can be applied in different ChemAxon tools for training.
Fig. 2 Using the generated training library in MarvinSketch
The following figure shows the results with (I) and without (II) applying the correction library.
|I. pKa calculation with training data||II. pKa calculation without training data|
To include your correction library in the pKa calculation use the parameter --correctionlibrary or its short form -L :
--correctionlibrary[library name] [input file/string]
If you use cxcalc pKa calculation without the correction library, the results will be calculated with the built-in dataset.
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 11.19 16.01 2.34 -2.59 7,11,9,4
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 8.34 16.01 2.34 -2.59 7,11,9,4
Chemical Terms are available from Chemical Terms Evaluator or from Instant JChem. Evaluator is designed to evaluate Chemical Terms expressions on molecules. Your correction library can be applied as follows:
evaluate -e "pKa('correctionlibrary:[library name]')" "[input file/string]"
Choose the 'New Chemical Terms Field icon' and type the chemical term into the window, use the correctionlibrary:[library name] parameter. Do not forget to adjust the Name, the Type and the DB Column Name.
The following picture demonstrates the usage of pKa training in the 'New Chemical terms' window. The expression
defines that the plugin use the correction library named mypKa, and it calculates the strongest acidic pKa of the molecule(s).
Fig. 3 New Chemical terms window showing the options to be set for pKa training
The results of this calculation are shown in the figure below, with the untrained (Strongest acidic pKa column) and trained (Trained strongest acidic pKa column) pKa values.
Fig. 4 JChem table showing the trained and untrained pKa values