Structure Checker Command Line Application

    structurecheck command-line

    Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurecheck is the command-line tool of Structure Checker .

    Options

    General options of structurecheck

     -h, --help this help page
    
     -hc, --help-checker-action help page of valid checker actions
    
     -hf, --help-fixer-action help page of valid fixer actions
    
     -m, --mode <operationmode> [check|fix]
    
     mode of the operation (default: check)
    
     check only check is executed, 
    
     does not modify molecules
    
     fix fix molecules containing structure errors 
    
     whenever possible
    
     -x fix mode (deprecated, use --mode fix)
    
    Input options:
    
     -c, --config <filepath|string> action string configuration
    
     actions separated by "..",
    
    Output options:
    
     -t, --output-type <output type> [single|separated|accepted|discarded]
    
     set output type(default: single) 
    
     single both accepted and discarded structures are
    
     written to the <output path>
    
     separated accepted structures are written to the
    
     <output path>, discarded structures are
    
     written to the <discarded path>
    
     accepted only accepted structures are 
    
     written to the <output path>
    
     discarded only discarded structures are 
    
     written to the <discarded path>
    
     -o, --output <output path> output file (default: standard output)
    
     -d, --discarded <discarded path> write molecules with structure error to 
    
     a separate file (default:standard output)
    
     -f, --format <format> output file format (default: smiles)
    
     -rf, --report-file <filepath> write report to a file
    
     -rp, --report-property write report to the property of the output
    
     -rt, --report-pattern <pattern> generate pattern based report file
    
     -re, --report-format <format> file format of the molecules in report
    
     -l, --log <filepath> write software-error log messages to file

    Avaliable checker actions: structurecheck -hc

    Valid checker actions (strings) are:
    
     3d detect atoms with 3D coordinates
    
     abbrevgroup detect all abbreviated groups
    
     :expanded=[true|false] detect expanded abbreviated groups
    
     :contracted=[true|false] detect contracted abbreviated groups
    
     :excluded=[...] exclude the following groups during check;
    
     set comma-separated list of group abbreviations,
    
     e.g., "abbrevgroup:excluded=[Ph,COOH,Val]"
    
     absentchiralflag detect absent chiral flag
    
     absolutestereoconfiguration detect molecules in which all asymmetric
    
     centers have absolute stereo configuration
    
     alias detect atoms with alias
    
     aromaticity (deprecated) use aromaticityerror
    
     aromaticityerror detect aromaticity errors with the given
    
     aromatization type (default: general)
    
     :basic basic aromaticity errors
    
     :loose loose aromaticity errors
    
     :general general aromaticity errors
    
     atommap detect atoms with map number
    
     atomqueryproperty detect all or specified atom query properties
    
     :H=[true|false] hydrogen count
    
     :X=[true|false] connection count
    
     :D=[true|false] explicit connection count
    
     :R=[true|false] ring count
    
     :h=[true|false] implicit hydrogen count
    
     :r=[true|false] smallest ring count
    
     :a=[true|false] aromaticity
    
     :s=[true|false] substitution count
    
     :u=[true|false] unsaturation
    
     :rb=[true|false] ring bond count
    
     atomvalue detect atoms with atom value
    
     atropisomer detect atropisomers
    
     attacheddata detect atoms with attached data
    
     :excluded=[...] exclude attached data with the listed names during the check;
    
     valid inputs are comma-separated list and regexp (if isExclusionRegexp=true)
    
     :isExclusionRegexp=[true|false] excluded names are defined as isExclusionRegexp
    
     bondangle detect unpreferred bond angles in 2d
    
     bondlength detect bonds that are too long or too short
    
     chiralflagerror detect incorrectly set chiral flag
    
     circularrgroup (deprecated) use circularrgroupreference
    
     circularrgroupreference detect circular R-group references
    
     coordsystem detect invalid coordination systems
    
     covalentcounterion detect covalently bonded alkali and alkaline earth metals on O, N, S
    
     crosseddoublebond detect crossed double bonds
    
     empty detect items without atoms
    
     explicith detect all or specified explicit hydrogens
    
     :lonely=[true|false] lonely explicit hydrogens
    
     :mapped=[true|false] mapped explicit hydrogens
    
     :charged=[true|false] charged explicit hydrogens
    
     :isotopic=[true|false] isotopic explicit hydrogens
    
     :radical=[true|false] radical explicit hydrogens
    
     :wedged=[true|false] wedged explicit hydrogens
    
     :hconnected=[true|false] hydrogen connected to hydrogen atom
    
     :polymerendgroup=[true|false] hydrogen connected to a SRU S-group
    
     :sgroup=[true|false] hydrogen which is the only atom in an S-group
    
     :sgroupend=[true|false] hydrogen connected to a Superatom S-group
    
     :valenceerror=[true|false] hydrogen connected to an atom which has
    
     valence error
    
     :bridgehead=[true|false] hydrogen connected to a bridgehead atom
    
     explicitlp detect explicit lone pairs
    
     ezdoublebond detect if a double bond can be cis or trans
    
     isotope detect isotopes
    
     metallocene detect incorrect metallocene representations
    
     missingatommap detect atoms without map numbers
    
     missingrgroup (deprecated) use missingrgroupreference
    
     missingrgroupreference detect missing R-group definitions
    
     moleculecharge detect non-neutral molecules
    
     multicenter detect multicenters
    
     multicomponent detect molecules containing disconnected parts
    
     multiplestereocenter detect molecules with multiple stereocenters
    
     ocr detect drawings that originates from
    
     incorrect optical structure recognition
    
     overlappingAtoms detect atoms that are too close to each other
    
     overlappingBonds detect bonds that are too close to each other
    
     pseudoatom detect pseudo atoms
    
     queryatom detect query atoms
    
     querybond detect query bonds
    
     racemate detect asymmetric tetrahedral atoms without
    
     specific stereo configuration
    
     radical detect radical atoms
    
     rare (deprecated) use rareelement
    
     rareelement detect rare elements
    
     ratom detect specified type of R-atoms
    
     :all=[true|false] all type of R-atoms
    
     :disconnected=[true|false] disconnected type of R-atoms
    
     :generic=[true|false] generic type of R-atoms
    
     :linker=[true|false] linker type of R-atoms
    
     :nested=[true|false] nested type of R-atoms
    
     reactionmap (deprecated) use reactionmaperror
    
     reactionmaperror detect reactions with invalid atom mapping
    
     relativestereo detect multiple stereogenic center groups
    
     rgroupattachmenterror detect all R-group attachment errors
    
     rgroupreferenceerror detect errors in R-group definitions
    
     DEPRECATED checker, please use
    
     "missingrgroup", "unusedrgroup",
    
     "circularrgroup" instead.
    
     :missingratom=[true|false] missing R-atom definition
    
     :missingrgroup=[true|false] missing R-group definition
    
     :selfreference=[true|false] self reference errors in R-group definitions
    
     ringstrainerror detect small rings with trans or cumulative
    
     double bonds, or triple bond
    
     solvent detect common solvents appearing
    
     by a main component
    
     staratom detect star atoms
    
     stereocarebox detect stereo search markers on double bonds
    
     straightdoublebond detect undefined double bond stereo layout
    
     substructure:[smarts] detect the given SMARTS structure
    
     as a substructure in the original molecule
    
     unbalancedreaction detect reactions with orphan atoms
    
     unusedrgroup (deprecated) use unusedrgroupreference
    
     unusedrgroupreference detect unused R-group definitions
    
     valence (deprecated) use valenceerror
    
     valenceerror detect valence errors
    
     valenceproperty detect atoms with all or specified
    
     valence properties
    
     :defaultvalence=[true|false] default valence properties
    
     :nondefaultvalence=[true|false] non-default valence properties
    
     wedge (deprecated) use wedgeerror
    
     wedgeerror detect incorrect wedge bonds
    
     wigglybond detect wiggly bonds on chiral centers
    
     wigglydoublebond detect non_stereo double bonds with wiggly
    
     representation connected to a double bond

    Avaliable fixer actions: structurecheck -hf

    Valid fixer actions (strings) are:
    
     addchiralflag add chiral flag to the molecule
    
     aliastoatom remove aliases from atoms
    
     aliastocarbon (deprecated) use converttocarbon
    
     aliastogroup convert atoms with aliases to abbreviated groups
    
     if the alias is recognized
    
     clean calculate 2D coordinates
    
     clearabsstereo (deprecated) use removeinvalidchiralflag
    
     contractgroup contract all abbreviated groups
    
     converttoelementalform convert isotopes into elemental atoms
    
     converttocarbon remove alias values from atoms and
    
     convert the atom to a carbon
    
     converttoionicform convert covalent counterions to ionic form
    
     converttometalloceneform convert non-standard metallocene representations
    
     converttosingle (deprecated) use converttosinglebond
    
     converttosinglebond convert faulty bonds to single bonds
    
     converttowigglydoublebond convert non-stereo double bond represented by
    
     crossed double bond to wiggly bond representation
    
     into coordinated multicenter representation
    
     crosseddoublebond convert non-stereo double bond represented by
    
     wiggly bond to crossed double bond representation
    
     crossedtowiggly (deprecated) use converttowigglydoublebond
    
     dearomatize convert aromatic rings into Kekule form
    
     expandgroup expand all abbreviated groups if it is possible
    
     fixmetallocene converts metallocenes to coordinative multicenter layout
    
     fixrgroupattachment add missing attachments points to members
    
     with single location
    
     fixunusedrgroups delete unreferenced R-group definitions
    
     fixvalence correct valence problem by removing hydrogens
    
     or setting charges
    
     mapmolecule add atom maps to each atom of the molecule
    
     mapreaction add atom maps to the reaction
    
     neutralize remove charges from the molecule
    
     partialclean recalculate parts of the atom coordinates for 2D layout
    
     pseudotogroup convert pseudo atoms to abbreviated groups
    
     if pseudo label is a known abbreviated group
    
     rearomatize dearomatize the molecule and aromatize it again
    
     removealias remove alias values from atoms
    
     removeatom remove the problematic atoms from the molecule
    
     removeatommap remove atom map numbers
    
     removeatomqueryproperty remove atom query properties
    
     removeatomvalue remove atom values
    
     removeattacheddata remove non-excluded attached data from atoms
    
     removebond remove problematic bonds from the molecule
    
     removeexplicith remove explicit hydrogens
    
     removeinvalidchiralflag remove the chiral flag
    
     removeradical convert radicals to non_radical atoms
    
     removestereocarebox remove stereo search markers from double bonds
    
     removevalenceproperty remove valence properties from atoms
    
     removezcoordinate set the z-coordinates of atoms to zero
    
     ungroup ungroup all abbreviated groups
    
     wedgeclean recalculate the orientation of wedge bonds[](#src-1806523)

    Usage

    structurecheck -c <config file> -m [mode] [<options>] [input list]

    The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string.

    structurecheck -c config.xml

    or

    structurecheck -c "atomqueryproperty"

    Parameter -m or --mode specifies the operation mode. The following operation modes are available:

    • check (default): searches for errors;

      structurecheck -c config.xml -m check
    • fix: fixes automatically fixable errors.

      structurecheck -c config.xml -m fix

      Note : When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).

      Note : The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).

    Input

    structurecheck accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:

    • input file(s),

    • input string(s), or

    • SMILES (default).

      structurecheck -c config.xml -m check input.mrv

      Note : If neither the input file nor the input string is specified, the standard input (console) will be read.

    structurecheck -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"

    Output

    structurecheck's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:

    • single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)

    • separated : valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).

      • If --discarded parameter is omitted, molecules with invalid structures are written to standard output;

      • If --output parameter is omitted, molecules with valid structures are written to standard output;

        Note : The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.

    • accepted : only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case)

    • discarded : only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)

    The report of structure checking can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.

    Note : Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.

    Usage examples

    Below you can find the short descriptions of some examples.If you want to check, fix, or filter structures in evaluate or JChem Cartridge, find examples here.

    1. structurecheck -c "metallocene"

      Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);

    2. structurecheck -c "bondLength" in.sdf

      Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

    3. structurecheck -c "isotope->converttoelementalform" in.sdf Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

    4. structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf

      Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;

    5. structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf

      Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.

      Note : The format of both outputs is SMILES(!) as --format (-f) is not defined;

    6. structurecheck -c config.xml -m fix -t separated -d discarded.sdf

      Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and writes molecules with valid structures to the standard output (console);

    7. structurecheck -c config.xml -m fix -t discarded in.sdf

      Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and omits molecules with valid structures.