Page tree

This manual gives you a walk-through on how to use the cxtrain command line tool:



Some property calculations can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated with the ChemAxon's commandline tool cxtrain. It is a part of JChem and Marvin Beans packages.

The generated training library, stored on the user's computer, is used by the calculator plugins for improving the property prediction.



Invoking cxtrain


Invoking cxtrain -h gives the following output:


cxtrain <prediction> [options] [input file (training set)]

pka                                   train pKa prediction
logp                                  train logP prediction
prediction                            train custom prediction
General options:
cxtrain -h, --help                    this help message
 -i, --training-id<training>          sets the training ID
 -l, --list                           list available training ID's
 -g, --ignore-error                   continue with next molecule on error
pKa options:
 -V, --validation <filepath>          validation results file path
logP options:
 -t, --tag <tag name>                 name of the SDFile tag that stores the experimental logP values
 -a, --add-built-in-training-set      add built-in logP training set
Custom prediction options:
 -t, --tag <tag name>                 name of the SDFile tag that stores the experimental property values 

So you can train a plugin by calling cxtrain:

cxtrain <prediction> [options] [input file (training set)] 

where prediction must be chosen from among pka, logP or Custom prediction (used for a custom property).

cxtrain is only able to train the three plugins mentioned. If another plugin name is given as a command line parameter, the following message appears:

Prediction has to be one of the following: pka, logp, prediction.

Input of cxtrain


cxtrain can handle any molecular file format that is supported by ChemAxon. (e.g.: MDL Molfile, SDF)  


Placing the training library

The generated training library is stored on your computer, and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc.


On Windows operating system the training file is placed under $HOME\chemaxon\calculations\training, where $HOME is commonly c:\Users\username.

On UNIX-based operating systems (Unix, Linux, OSX) the training file is placed under $HOME/.chemaxon/calculations/training, where $HOME is tipically /home/useraccount on Linux and /users/useraccount on OSX.




General options

The following general options are available:
  1. Applying the option --training-id (-i), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.
  2. The available training ID's can be listed using option --list (-l).
  3. --ignore-error (-g) skips the molecule on error and continues with the next correct one.

Plugin-specific options

The following plugin-specific options are available:

pKa Plugin:

  • --validation <filepath> (-V) creates validation data; the file path of the pKa training validation chart can be defined optionally.

logP plugin:

  • --add-built-in-training-set (-a) merges your data with the data from built-in logP training set.

  • Option --tag (-t) defines the name of the SDFile tag that stores the experimental logP values.

Custom prediction option:

  • Option --tag (-t) defines the name of the SDFile tag that stores the experimental custom defined values.


Training pKcalculations

Step #1  Creating the training library from a given data file pKa_trainingset.sdf with a training ID mypka:

cxtrain pka -i mypka pKa_trainingset.sdf


Step #2  Using the generated training set in pKa calcutlations with cxcalc:             

cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"


The result of the training is:


              id apKa1 apKa2 bpKa1 bpKa2 atoms
              1 11.19 16.01 2.34 -2.59 7,11,9,4


Training logP calculations

Step #1 Creating the training library from the given data file logP_trainingset.sdf (with experimental logP values stored in the SDF tag named LOGP), setting training ID to mylogp and including data from the built-in training set:

cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf


Step #2 To apply your generated logP training library in calculations use the parameter --trainingid and combine it with the parameter --method via cxcalc:

 cxcalc logp --method user --trainingid mylogp "CC(C)CCO"


The result of the training is:

              id logP 1 1,13

The following command lists available training IDs for logP calculation:

cxtrain logp --list

The following command trains a custom property calculation using the datafile pampa_trainingset.sdf (with the experimental values stored in the SDF tag named PAMPA) and setting training ID to mypampa:

cxtrain prediction -t PAMPA -i mypampa pampa_trainingset.sdf