This manual gives you a walk-through on how to use the cxtrain command line tool:
Some property calculations can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated with the Chemaxon's commandline tool cxtrain . It is a part of JChem and Marvin Beans packages.
The generated training library, stored on the user's computer, is used by the calculator plugins for improving the property prediction.
Invoking cxtrain -h gives the following output:
cxtrain <prediction> [options] [input file (training set)]
Prediction:
pka train pKa prediction
logp train logP prediction
prediction train custom prediction
General options:
cxtrain -h, --help this help message
-i, --training-id<training> sets the training ID
-l, --list list available training ID's
-g, --ignore-error continue with next molecule on error
pKa options:
-V, --validation <filepath> validation results file path
logP options:
-t, --tag <tag name> name of the SDFile tag that stores the experimental logP values
-a, --add-built-in-training-set add built-in logP training set
Custom prediction options:
-t, --tag <tag name> name of the SDFile tag that stores the experimental property values
So you can train a plugin by calling cxtrain:
cxtrain <prediction> [options] [input file (training set)]
where prediction must be chosen from among pka , logP or Custom prediction (used for a custom property).
{warning} cxtrain is only able to train the three plugins mentioned. If another plugin name is given as a command line parameter, the following message appears:
Prediction has to be one of the following: pka, logp, prediction.
cxtrain can handle any molecular file format that is supported by Chemaxon. (e.g.: MDL Molfile, SDF)
The generated training library is stored on your computer, and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc
.
{info} On Windows operating system the training file is placed under $HOME\chemaxon\calculations raining, where $HOME is commonly c:\Users\username.
{info} On UNIX-based operating systems (Unix, Linux, OSX) the training file is placed under $HOME/.chemaxon/calculations/training, where $HOME is tipically /home/useraccount on Linux and /users/useraccount on OSX.
The following general options are available:
Applying the option --training-id ( -i ), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.
The available training ID's can be listed using option --list ( -l ).
--ignore-error ( -g ) skips the molecule on error and continues with the next correct one.
The following plugin-specific options are available:
p K a Plugin:
logP plugin:
--add-built-in-training-set ( -a ) merges your data with the data from built-in logP training set.
Option --tag ( -t ) defines the name of the SDFile tag that stores the experimental logP values.
Custom prediction option:
Step #1 Creating the training library from a given data file pKa_trainingset.sdf with a training ID mypka:
cxtrain pka -i mypka pKa_trainingset.sdf
Step #2 Using the generated training set in pKa calcutlations with cxcalc:
cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"
The result of the training is:
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 11.19 16.01 2.34 -2.59 7,11,9,4
Step #1 Creating the training library from the given data file logP_trainingset.sdf (with experimental log P values stored in the SDF tag named LOGP ), setting training ID to mylogp and including data from the built-in training set:
cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf
Step #2 To apply your generated log P training library in calculations use the parameter --trainingid and combine it with the parameter --method via cxcalc:
cxcalc logp --method user --trainingid mylogp "CC(C)CCO"
The result of the training is:
id logP 1 1,13
{info} The following command lists available training IDs for logP calculation:
cxtrain logp --list
{info} The following command trains a custom property calculation using the datafile pampa_trainingset.sdf (with the experimental values stored in the SDF tag named PAMPA ) and setting training ID to mypampa :
cxtrain prediction -t PAMPA -i mypampa pampa_trainingset.sdf