Page tree

The Standardizer command-line tool reaches all the functionalities of Standardizer.

General usage schema

You can standardize molecules with Standardizer using the following command:

standardize [input file(s)/string(s)] -c <config file/string> [options]

Prepare the usage of the standardize script or batch file as described here.

Options

Available options will be displayed by this command:

    standardize -h

The appearing general options: 
      -h, --help                          this help message
      -ha, --help-actions                 list of available actions
      -g, --ignore-error                  continue with next molecule on error
          --empty-mol-on-error            write an empty molecule, and continue
                                          with next molecule on error
          --unstandardized-mol-on-error   write the original molecule, and continue
                                          with next molecule on error

    Input options:
      -c, --config <filepath|string>      configuration XML file
                                          or action string,
                                          actions separated by "..",
    Output options:
      -f, --format <format>               output file format (default: smiles)
      -o, --output <filepath>             output file (default: standard output)
      -e, --export-fields-to-smiles       export property fields to SMILES
      -v, --verbose                       verbose output with time results
          --log <filepath>                write log messages to file
                                          default: write log to system error
          --loglevel <level>              sets the log level
                                          levels: [severe|warning|info|off]
      -rp, --report-property <prop-name>  name of the property to wirte report
        

The command line parameter --config is mandatory. This specifies the path and filename of a configuration file or else it is the simple actionstring. We highly recommend creating the configuration file via Standardizer GUI or Standardizer Editor.

By default, the program exits in case of molecule import/export errors. If the command-line parameter -g or --ignore-error is specified, then errors will not stop the process. The error is written to the console, the molecule is discarded from the structure file (the resulting file will contain less molecules than the input file). With option --empty-mol-on-error the structure is changed for an empty molecule. The molecule is presented in the original form when using the option --unstandardized-mol-on-error. Both of these settings result in a file containing the same number of structures as the input file.

Standardizer Actions

The following command lists the available Standardizer actions:

    standardize -ha

Valid actionstrings are: 

  addexplicitH                   convert implicit hydrogens to explicit
  aliastoatom                    convert pseudo and alias atoms to normal atoms
  aliastogroup                   convert pseudo and alias atoms to groups
  aromatize                      perform aromatization (default: general)
    :general                     perform general aromatization
    :basic                       perform basic aromatization
    :loose                       perform loose aromatization
  clean2d                        perform partial clean in 2D (alternative:
                                 clean)
    :full                        perform full clean in 2D
    :removezcoordinate           remove z coordinate of 3D structures
    :<template file>             perform template based clean in 2D
  clean3d                        perform clean in 3D
  clearisotopes                  convert isotope to the given chemical element
  clearstereo                    clear stereo information
                                 (default: chirality, doublebond)
    :chirality                   clear tetrahedral stereo information
    :doublebond                  clear double bond stereo information
    :singleupordownbond          clear "Single Up or Down" bonds (wiggly bonds)
                                 connecting to chiral centers
  contractsgroups                contract S-groups
    :exclude='<group>'           specified group(s) remain expanded
  convertdoublebonds             convert unspecified double bond stereo
                                 (default: crossed)
    :crossed                     convert unspecified double bond stereo
                                 to crossed representation
    :wiggly                      convert unspecified double bond stereo
                                 to wiggly representation
  convertpimetalbonds            convert metal-multicenter single bonds
                                 to coordinate type
  converttoenhancedstereo        convert to enhanced stereo (default: and)
    :abs                         convert to enhanced stereo; stereogenic
                                 centers without enhanced stereo flag go into a
                                 new "abs" groups
    :and                         convert to enhanced stereo; stereogenic
                                 centers without enhanced stereo flag go into a
                                 new "and" groups
    :or                          convert to enhanced stereo; stereogenic
                                 centers without enhanced stereo flag go into a
                                 new "or" groups
  creategroup                    create group from abbreviated group definition
    :group=<group>               specify group, e.g. creategroup:Val,Ser,Boc
  dearomatize                    convert aromatic bonds to Kekule form
  disconnectmetalatoms           remove covalent bonds between
                                 metals and non-carbons
  expand                         multiple fragments according to
                                 stoichiometry data
  expandsgroups                  expand S-groups
    :exclude='<group>'           specified group(s) remain contracted
  map                            add atom maps
                                 (default: complete, keepmapping)
    :complete                    map all atoms
    :matching                    map matching atoms of a reaction
    :changing                    map changing atoms of a reaction
    :keepmapping                 keep existing mapping
    :markbonds                   mark changing bonds
  mapreaction                    add atom maps to reaction
                                 (default: complete, keepmapping)
    :complete                    map all atoms
    :matching                    map matching atoms of a reaction
    :changing                    map changing atoms of a reaction
    :keepmapping                 keep existing mapping
    :markbonds                   mark changing bonds
  mesomerize                     take canonical mesomer form
  neutralize                     neutralize charged molecules
  removeabsolutestereo           remove absolute stereo flag (chiral flag)
  removeatomvalues               remove atom values (extra atom labels)
  removeattacheddata             remove attached data from molecules
  removeexplicitH                convert explicit hydrogens to implicit ones
                                 by default, plain hydrogen atoms are removed
    :bridgehead                  connecting to bridgehead atom
    :charged                     charged explicit hydrogens
    :hconnected                  hydrogen connected to hydrogen atom
    :isotopic                    isotopic explicit hydrogens
    :lonely                      hydrogens without any connections
    :mapped                      mapped explicit hydrogens
    :polymerendgroup             hydrogen connected to a SRU S-group
    :radical                     radical explicit hydrogens
    :sgroup                      hydrogen which is the only atom in an S-group
    :sgroupend                   hydrogen connected to a Superatom S-group
    :valenceerror                hydrogen connected to an atom which has
                                 valence error
    :wedged                      wedged explicit hydrogens
  removefragment                 remove fragments from molecules
                                 (default: method=keeplargest,
                                 measure=atomcount)
    :method=keeplargest          keep the largest fragment
    :method=keepsmallest         keep the smallest fragment
    :method=removelargest        remove the largest fragment
    :method=removesmallest       remove the smallest fragment
    :measure=atomcount           remove fragments according to atom count
    :measure=molmass             remove fragments according to molecular mass
    :measure=heavyatomcount      remove fragments according to
                                 heavy atom count
  removergroupdefinitions        remove R-group definitions; keep root
                                 structure
  removestereocarebox            remove stereo search markers from double bonds
  replaceatoms                   replace a chemical element with another
                                 element
    :queryatom='<atom symbol>'   atom to be replaced
    :replaceatom='<atom symbol>' new atom to be included instead
    :allowValenceError           tolerance of valence errors
  setabsolutestereo              set absolute stereo flag (chiral flag)
  stripsalts                     remove salts listed in the salt dictionary
  tautomerize                    take canonical tautomer form
  ungroupsgroups                 ungroup S-groups
    :exclude='<group>'           specified group(s) remain grouped
  unmap                          remove atom maps
  wedgeclean                     rearranges stereo wedges according to
                                 IUPAC recommendations
  <reaction SMARTS>              transform molecule according to reaction
                                 SMARTS, e.g. nitro transformation
                                 [O-][N+]=O>>O=N=O

Visit Standardizer Actions to find out more about the functionality of each action! 

Input

Most molecular file formats are accepted.

The input is either specified in input file(s), or else in input string(s), usually in SMILES format.

If neither the input file name(s) nor the input string(s) are specified in the command line then the standard input is read.

Output

Standardizer writes output molecules in the format specified by the --format option (the default format is "SMILES"). If the --output is omitted, results are written to the standard output.

If the command line parameter --export-fields-to-smiles is specified, then the property fields (SDF fields) of the molecules will be exported even if the output format is SMILES, SMARTS, ChemAxon Extended SMILES or ChemAxon Extended SMARTS. In case of other formats the property fields are always exported, this option has no effects.

Usage examples

  1. A UNIX command that reads molecular structures from the mols.sdf file and writes the standardized molecules to the standard output in smiles format:
                standardize -c Standardizer.xml mols.sdf
                
  2. A UNIX command that reads molecules given as SMILES strings from file nci10000.smiles located in the./test/pharmacophore directory and writes results in the file named nci10000.sdf to be created in the same directory:
                standardize -c Standardizer.xml nci10000.smiles -f sdf -o nci10000.sdf
                
  3. The same with transformation check and verbose output, then displaying the result in MarvinView:
                standardize -c Standardizer.xml -e -v nci100.smiles -f sdf -o nci100.sdf
                mview nci100.sdf
                
  4. Processing an SD file and displaying the standardized molecules using MarvinView:
                standardize -c Standardizer.xml med100.sdf | mview -
                

    Note that such piping does not work in Windows.

  5. Standardization with actionstring:
                standardize -c "aromatize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O" med100.sdf -o med100.smiles
                
  6. Standardization with actionstring, taking input molecules as SMILES strings:
                standardize -c "aromatize..[O-:2][N+:1]=O>>[O:2]=[N:1]=O" \
                "[O-][N+](=O)C1=CC=CC=C1" "[H]C1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O"
                
  7. Processing tasks belonging to no groups or to task group "target":
                standardize -c Standardizer.xml -u target targets.sdf -f sdf -o output.sdf