Page tree

structurechecker command-line

Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurechecker is the command-line tool of Structure Checker.

Options

General options of structurechecker

  -h,  --help                        this help page
  -hc, --help-checker-action         help page of valid checker actions
  -hf, --help-fixer-action           help page of valid fixer actions
  -m, --mode <operationmode>         [check|fix]
                                     mode of the operation (default: check)
          check                      only check is executed, 
                                     does not modify molecules
          fix                        fix molecules containing structure errors 
                                     whenever possible
  -x                                 fix mode (deprecated, use --mode fix)
Input options:
  -c, --config <filepath|string>     action string configuration
                                     actions separated by "..",
Output options:
  -t, --output-type <output type>    [single|separated|accepted|discarded]
                                     set output type(default: single)           
          single                     both accepted and discarded structures are
                                     written to the <output path>
          separated                  accepted structures are written to the
                                     <output path>, discarded structures are
                                     written to the <discarded path>
          accepted                   only accepted structures are  
                                     written to the <output path>
          discarded                  only discarded structures are  
                                     written to the <discarded path>
  -o, --output <output path>         output file (default: standard output)
  -d, --discarded <discarded path>   write molecules with structure error to 
                                     a separate file (default:standard output)
  -f, --format <format>              output file format (default: smiles)
  -rf, --report-file <filepath>      write report to a file
  -rp, --report-property             write report to the property of the output
  -rt, --report-pattern <pattern>    generate pattern based report file
  -re, --report-format <format>      file format of the molecules in report
  -l, --log <filepath>               write software-error log messages to file

Avaliable checker actions: structurechecker -hc

Valid checker actions (strings) are:
  3d                             detect atoms with 3D coordinates
  abbrevgroup                    detect all abbreviated groups
    :expanded=[true|false]       detect expanded abbreviated groups
    :contracted=[true|false]     detect contracted abbreviated groups
    :excluded=[...]              exclude the following groups during check;
                                 set comma-separated list of group abbreviations,
                                 e.g., "abbrevgroup:excluded=[Ph,COOH,Val]"
  absentchiralflag               detect absent chiral flag
  absolutestereoconfiguration    detect molecules in which all asymmetric
                                 centers have absolute stereo configuration
  alias                          detect atoms with alias
  aromaticity                    (deprecated) use aromaticityerror
  aromaticityerror               detect aromaticity errors with the given
                                 aromatization type (default: general)
    :basic                       basic aromaticity errors
    :loose                       loose aromaticity errors
    :general                     general aromaticity errors
  atommap                        detect atoms with map number
  atomqueryproperty              detect all or specified atom query properties
    :H=[true|false]              hydrogen count
    :X=[true|false]              connection count
    :D=[true|false]              explicit connection count
    :R=[true|false]              ring count
    :h=[true|false]              implicit hydrogen count
    :r=[true|false]              smallest ring count
    :a=[true|false]              aromaticity
    :s=[true|false]              substitution count
    :u=[true|false]              unsaturation
    :rb=[true|false]             ring bond count
  atomvalue                      detect atoms with atom value
  atropisomer                    detect atropisomers
  attacheddata                   detect atoms with attached data
  bondangle                      detect unpreferred bond angles in 2d
  bondlength                     detect bonds that are too long or too short
  chiralflagerror                detect incorrectly set chiral flag
  circularrgroup                 (deprecated) use circularrgroupreference
  circularrgroupreference        detect circular R-group references
  coordsystem                    detect invalid coordination systems
  covalentcounterion             detect covalent counterions
  crosseddoublebond              detect crossed double bonds
  empty                          detect items without atoms
  explicith                      detect all or specified explicit hydrogens
    :lonely=[true|false]         lonely explicit hydrogens
    :mapped=[true|false]         mapped explicit hydrogens
    :charged=[true|false]        charged explicit hydrogens
    :isotopic=[true|false]       isotopic explicit hydrogens
    :radical=[true|false]        radical explicit hydrogens
    :wedged=[true|false]         wedged explicit hydrogens
    :hconnected=[true|false]     hydrogen connected to hydrogen atom
    :polymerendgroup=[true|false]  hydrogen connected to a SRU S-group
    :sgroup=[true|false]         hydrogen which is the only atom in an S-group
    :sgroupend=[true|false]      hydrogen connected to a Superatom S-group
    :valenceerror=[true|false]   hydrogen connected to an atom which has
                                 valence error
    :bridgehead=[true|false]     hydrogen connected to a bridgehead atom
  explicitlp                     detect explicit lone pairs
  ezdoublebond                   detect if a double bond can be cis or trans
  isotope                        detect isotopes
  metallocene                    detect incorrect metallocene representations
  missingatommap                 detect atoms without map numbers
  missingrgroup                  (deprecated) use missingrgroupreference
  missingrgroupreference         detect missing R-group definitions
  moleculecharge                 detect non-neutral molecules
  multicenter                    detect multicenters
  multicomponent                 detect molecules containing disconnected parts
  multiplestereocenter           detect molecules with multiple stereocenters
  ocr                            detect drawings that originates from
                                 incorrect optical structure recognition
  overlappingAtoms               detect atoms that are too close to each other
  overlappingBonds               detect bonds that are too close to each other
  pseudoatom                     detect pseudo atoms
  queryatom                      detect query atoms
  querybond                      detect query bonds
  racemate                       detect asymmetric tetrahedral atoms without
                                 specific stereo configuration
  radical                        detect radical atoms
  rare                           (deprecated) use rareelement
  rareelement                    detect rare elements
  ratom                          detect specified type of R-atoms
    :all=[true|false]            all type of R-atoms
    :disconnected=[true|false]   disconnected type of R-atoms
    :generic=[true|false]        generic type of R-atoms
    :linker=[true|false]         linker type of R-atoms
    :nested=[true|false]         nested type of R-atoms
  reactionmap                    (deprecated) use reactionmaperror
  reactionmaperror               detect reactions with invalid atom mapping
  relativestereo                 detect multiple stereogenic center groups
  rgroupattachmenterror          detect all R-group attachment errors
  rgroupreferenceerror           detect errors in R-group definitions
                                 DEPRECATED checker, please use
                                 "missingrgroup", "unusedrgroup",
                                 "circularrgroup" instead.
    :missingratom=[true|false]   missing R-atom definition
    :missingrgroup=[true|false]  missing R-group definition
    :selfreference=[true|false]  self reference errors in R-group definitions
  ringstrainerror                detect small rings with trans or cumulative
                                 double bonds, or triple bond
  solvent                        detect common solvents appearing
                                 by a main component
  staratom                       detect star atoms
  stereocarebox                  detect stereo search markers on double bonds
  straightdoublebond             detect undefined double bond stereo layout
  substructure:[smarts]          detect the given SMARTS structure
                                 as a substructure in the original molecule
  unbalancedreaction             detect reactions with orphan atoms
  unusedrgroup                   (deprecated) use unusedrgroupreference
  unusedrgroupreference          detect unused R-group definitions
  valence                        (deprecated) use valenceerror
  valenceerror                   detect valence errors
  valenceproperty                detect atoms with all or specified
                                 valence properties
    :defaultvalence=[true|false]     default valence properties
    :nondefaultvalence=[true|false]  non-default valence properties
  wedge                          (deprecated) use wedgeerror
  wedgeerror                     detect incorrect wedge bonds
  wigglybond                     detect wiggly bonds on chiral centers
  wigglydoublebond               detect non_stereo double bonds with wiggly
                                 representation connected to a double bond

Avaliable fixer actions: structurechecker -hf

Valid fixer actions (strings) are:
  addchiralflag             add chiral flag to the molecule
  aliastoatom               remove aliases from atoms
  aliastocarbon             (deprecated) use converttocarbon
  aliastogroup              convert atoms with aliases to abbreviated groups
                            if the alias is recognized
  clean                     calculate 2D coordinates
  clearabsstereo            (deprecated) use removeinvalidchiralflag
  contractgroup             contract all abbreviated groups
  converttoelementalform    convert isotopes into elemental atoms
  converttocarbon           remove alias values from atoms and
                            convert the atom to a carbon
  converttoionicform        convert covalent counterions to ionic form
  converttometalloceneform  convert non-standard metallocene representations
  converttosingle           (deprecated) use converttosinglebond
  converttosinglebond       convert faulty bonds to single bonds
  converttowigglydoublebond convert non-stereo double bond represented by
                            crossed double bond to wiggly bond representation
                            into coordinated multicenter representation
  crosseddoublebond         convert non-stereo double bond represented by
                            wiggly bond to crossed double bond representation
  crossedtowiggly           (deprecated) use converttowigglydoublebond
  dearomatize               convert aromatic rings into Kekule form
  expandgroup               expand all abbreviated groups if it is possible
  fixmetallocene            converts metallocenes to coordinative multicenter layout
  fixrgroupattachment       add missing attachments points to members
                            with single location
  fixunusedrgroups          delete unreferenced R-group definitions
  fixvalence                correct valence problem by removing hydrogens
                            or setting charges
  mapmolecule               add atom maps to each atom of the molecule
  mapreaction               add atom maps to the reaction
  neutralize                remove charges from the molecule
  partialclean              recalculate parts of the atom coordinates for 2D layout
  pseudotogroup             convert pseudo atoms to abbreviated groups
                            if pseudo label is a known abbreviated group
  rearomatize               dearomatize the molecule and aromatize it again
  removealias               remove alias values from atoms
  removeatom                remove the problematic atoms from the molecule
  removeatommap             remove atom map numbers
  removeatomqueryproperty   remove atom query properties
  removeatomvalue           remove atom values
  removeattacheddata        remove data attached to atoms
  removebond                remove problematic bonds from the molecule
  removeexplicith           remove explicit hydrogens
  removeinvalidchiralflag   remove the chiral flag
  removeradical             convert radicals to non_radical atoms
  removestereocarebox       remove stereo search markers from double bonds
  removevalenceproperty     remove valence properties from atoms
  removezcoordinate         set the z-coordinates of atoms to zero
  ungroup                   ungroup all abbreviated groups
  wedgeclean                recalculate the orientation of wedge bonds

Usage

structurechecker  -c <config file> -m [mode] [<options>] [input list]

The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string.

structurechecker -c config.xml

or

structurechecker -c "atomqueryproperty"

Parameter -m or --mode specifies the operation mode. The following operation modes are available:

  • check (default): searches for errors;
    structurechecker -c config.xml -m check
  • fix: fixes automatically fixable errors.
    structurechecker -c config.xml -m fix

Note: When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).

 

Note: The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).

Input

structurechecker accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:

  • input file(s),
  • input string(s), or
  • SMILES (default).
    structurechecker -c config.xml -m check input.mrv

Note: If neither the input file nor the input string is specified, the standard input (console) will be read.

structurechecker -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"

 

Output

structurechecker's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:

  • single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)
  • separated: valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).
    • If --discarded parameter is omitted, molecules with invalid structures are written to standard output;
    • If --output parameter is omitted, molecules with valid structures are written to standard output;

      Note: The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.

  • accepted: only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case)
  • discarded: only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)

The report of structure checking can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.

 

Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.

Usage examples

Below you can find the short descriptions of some examples.If you want to check, fix, or filter structures in evaluate or JChem Cartridge, find examples here.

  1. structurechecker -c "metallocene"
    Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);
  2. structurechecker -c "bondLength" in.sdf
    Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
  3. structurechecker -c "isotope->converttoelementalform" in.sdf Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);
  4. structurechecker -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
    Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;
  5. structurechecker -c config.xml -t separated -o out.sdf -d discarded.sdf
    Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.

    Note: The format of both outputs is SMILES(!) as --format (-f) is not defined;

  6. structurechecker -c config.xml -m fix -t separated -d discarded.sdf
    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and writes molecules with valid structures to the standard output (console);
  7. structurechecker -c config.xml -m fix -t discarded in.sdf
    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and omits molecules with valid structures.