cxtrain command line tool

    This manual gives you a walk-through on how to use the cxtrain command line tool:

    Introduction

    Some property calculations can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated with the Chemaxon's commandline tool cxtrain . It is a part of JChem and Marvin Beans packages.

    The generated training library, stored on the user's computer, is used by the calculator plugins for improving the property prediction.

    Usage

    Invoking cxtrain

    Invoking cxtrain -h gives the following output:

    cxtrain <prediction> [options] [input file (training set)]

    Prediction:
    
    pka train pKa prediction
    
    logp train logP prediction
    
    prediction train custom prediction
    
    General options:
    
    cxtrain -h, --help this help message
    
     -i, --training-id<training> sets the training ID
    
     -l, --list list available training ID's
    
     -g, --ignore-error continue with next molecule on error
    
    pKa options:
    
     -V, --validation <filepath> validation results file path
    
    logP options:
    
     -t, --tag <tag name> name of the SDFile tag that stores the experimental logP values
    
     -a, --add-built-in-training-set add built-in logP training set
    
    Custom prediction options:
    
     -t, --tag <tag name> name of the SDFile tag that stores the experimental property values 
    

    So you can train a plugin by calling cxtrain:

    cxtrain <prediction> [options] [input file (training set)] 

    where prediction must be chosen from among pka , logP or Custom prediction (used for a custom property).

    {warning} cxtrain is only able to train the three plugins mentioned. If another plugin name is given as a command line parameter, the following message appears:

    Prediction has to be one of the following: pka, logp, prediction.

    Input of cxtrain

    cxtrain can handle any molecular file format that is supported by Chemaxon. (e.g.: MDL Molfile, SDF)

    Placing the training library

    The generated training library is stored on your computer, and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc.

    {info} On Windows operating system the training file is placed under $HOME\chemaxon\calculations raining, where $HOME is commonly c:\Users\username.

    {info} On UNIX-based operating systems (Unix, Linux, OSX) the training file is placed under $HOME/.chemaxon/calculations/training, where $HOME is tipically /home/useraccount on Linux and /users/useraccount on OSX.

    Options

    General options

    The following general options are available:

    1. Applying the option --training-id ( -i ), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.

    2. The available training ID's can be listed using option --list ( -l ).

    3. --ignore-error ( -g ) skips the molecule on error and continues with the next correct one.

    Plugin-specific options

    The following plugin-specific options are available:

    p K a Plugin:

    • --validation <filepath> ( -V ) creates validation data; the file path of the p K a training validation chart can be defined optionally.

    logP plugin:

    • --add-built-in-training-set ( -a ) merges your data with the data from built-in logP training set.

    • Option --tag ( -t ) defines the name of the SDFile tag that stores the experimental logP values.

    Custom prediction option:

    • Option --tag ( -t ) defines the name of the SDFile tag that stores the experimental custom defined values.

    Examples

    Training pKa calculations

    Step #1 Creating the training library from a given data file pKa_trainingset.sdf with a training ID mypka:

    cxtrain pka -i mypka pKa_trainingset.sdf

    Step #2 Using the generated training set in pKa calcutlations with cxcalc:

    cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"

    The result of the training is:

     id apKa1 apKa2 bpKa1 bpKa2 atoms
     1 11.19 16.01 2.34 -2.59 7,11,9,4

    Training logP calculations

    Step #1 Creating the training library from the given data file logP_trainingset.sdf (with experimental log P values stored in the SDF tag named LOGP ), setting training ID to mylogp and including data from the built-in training set:

    cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf

    Step #2 To apply your generated log P training library in calculations use the parameter --trainingid and combine it with the parameter --method via cxcalc:

     cxcalc logp --method user --trainingid mylogp "CC(C)CCO"

    The result of the training is:

     id logP 1 1,13

    {info} The following command lists available training IDs for logP calculation:

    cxtrain logp --list

    {info} The following command trains a custom property calculation using the datafile pampa_trainingset.sdf (with the experimental values stored in the SDF tag named PAMPA ) and setting training ID to mypampa :

    cxtrain prediction -t PAMPA -i mypampa pampa_trainingset.sdf