Skip to content

Structure Checker Command Line Application

structurecheck command-line

Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurecheck is the command-line tool of Structure Checker .

Options

General options of structurecheck

 -h, --help this help page

 -hc, --help-checker-action help page of valid checker actions

 -hf, --help-fixer-action help page of valid fixer actions

 -m, --mode <operationmode> [check|fix]

 mode of the operation (default: check)

 check only check is executed, 

 does not modify molecules

 fix fix molecules containing structure errors 

 whenever possible

 -x fix mode (deprecated, use --mode fix)

Input options:

 -c, --config <filepath|string> action string configuration

 actions separated by "..",

Output options:

 -t, --output-type <output type> [single|separated|accepted|discarded]

 set output type(default: single) 

 single both accepted and discarded structures are

 written to the <output path>

 separated accepted structures are written to the

 <output path>, discarded structures are

 written to the <discarded path>

 accepted only accepted structures are 

 written to the <output path>

 discarded only discarded structures are 

 written to the <discarded path>

 -o, --output <output path> output file (default: standard output)

 -d, --discarded <discarded path> write molecules with structure error to 

 a separate file (default:standard output)

 -f, --format <format> output file format (default: smiles)

 -rf, --report-file <filepath> write report to a file

 -rp, --report-property write report to the property of the output

 -rt, --report-pattern <pattern> generate pattern based report file

 -re, --report-format <format> file format of the molecules in report

 -l, --log <filepath> write software-error log messages to file

Avaliable checker actions: structurecheck -hc

Valid checker actions (strings) are:

 3d detect atoms with 3D coordinates

 abbrevgroup detect all abbreviated groups

 :expanded=[true|false] detect expanded abbreviated groups

 :contracted=[true|false] detect contracted abbreviated groups

 :excluded=[...] exclude the following groups during check;

 set comma-separated list of group abbreviations,

 e.g., "abbrevgroup:excluded=[Ph,COOH,Val]"

 absentchiralflag detect absent chiral flag

 absolutestereoconfiguration detect molecules in which all asymmetric

 centers have absolute stereo configuration

 alias detect atoms with alias

 aromaticity (deprecated) use aromaticityerror

 aromaticityerror detect aromaticity errors with the given

 aromatization type (default: general)

 :basic basic aromaticity errors

 :loose loose aromaticity errors

 :general general aromaticity errors

 atommap detect atoms with map number

 atomqueryproperty detect all or specified atom query properties

 :H=[true|false] hydrogen count

 :X=[true|false] connection count

 :D=[true|false] explicit connection count

 :R=[true|false] ring count

 :h=[true|false] implicit hydrogen count

 :r=[true|false] smallest ring count

 :a=[true|false] aromaticity

 :s=[true|false] substitution count

 :u=[true|false] unsaturation

 :rb=[true|false] ring bond count

 atomvalue detect atoms with atom value

 atropisomer detect atropisomers

 attacheddata detect atoms with attached data

 :excluded=[...] exclude attached data with the listed names during the check;

 valid inputs are comma-separated list and regexp (if isExclusionRegexp=true)

 :isExclusionRegexp=[true|false] excluded names are defined as isExclusionRegexp

 bondangle detect unpreferred bond angles in 2d

 bondlength detect bonds that are too long or too short

 chiralflagerror detect incorrectly set chiral flag

 circularrgroup (deprecated) use circularrgroupreference

 circularrgroupreference detect circular R-group references

 coordsystem detect invalid coordination systems

 covalentcounterion detect covalently bonded alkali and alkaline earth metals on O, N, S

 crosseddoublebond detect crossed double bonds

 empty detect items without atoms

 explicith detect all or specified explicit hydrogens

 :lonely=[true|false] lonely explicit hydrogens

 :mapped=[true|false] mapped explicit hydrogens

 :charged=[true|false] charged explicit hydrogens

 :isotopic=[true|false] isotopic explicit hydrogens

 :radical=[true|false] radical explicit hydrogens

 :wedged=[true|false] wedged explicit hydrogens

 :hconnected=[true|false] hydrogen connected to hydrogen atom

 :polymerendgroup=[true|false] hydrogen connected to a SRU S-group

 :sgroup=[true|false] hydrogen which is the only atom in an S-group

 :sgroupend=[true|false] hydrogen connected to a Superatom S-group

 :valenceerror=[true|false] hydrogen connected to an atom which has

 valence error

 :bridgehead=[true|false] hydrogen connected to a bridgehead atom

 explicitlp detect explicit lone pairs

 ezdoublebond detect if a double bond can be cis or trans

 isotope detect isotopes

 metallocene detect incorrect metallocene representations

 missingatommap detect atoms without map numbers

 missingrgroup (deprecated) use missingrgroupreference

 missingrgroupreference detect missing R-group definitions

 moleculecharge detect non-neutral molecules

 multicenter detect multicenters

 multicomponent detect molecules containing disconnected parts

 multiplestereocenter detect molecules with multiple stereocenters

 ocr detect drawings that originates from

 incorrect optical structure recognition

 overlappingAtoms detect atoms that are too close to each other

 overlappingBonds detect bonds that are too close to each other

 pseudoatom detect pseudo atoms

 queryatom detect query atoms

 querybond detect query bonds

 racemate detect asymmetric tetrahedral atoms without

 specific stereo configuration

 radical detect radical atoms

 rare (deprecated) use rareelement

 rareelement detect rare elements

 ratom detect specified type of R-atoms

 :all=[true|false] all type of R-atoms

 :disconnected=[true|false] disconnected type of R-atoms

 :generic=[true|false] generic type of R-atoms

 :linker=[true|false] linker type of R-atoms

 :nested=[true|false] nested type of R-atoms

 reactionmap (deprecated) use reactionmaperror

 reactionmaperror detect reactions with invalid atom mapping

 relativestereo detect multiple stereogenic center groups

 rgroupattachmenterror detect all R-group attachment errors

 rgroupreferenceerror detect errors in R-group definitions

 DEPRECATED checker, please use

 "missingrgroup", "unusedrgroup",

 "circularrgroup" instead.

 :missingratom=[true|false] missing R-atom definition

 :missingrgroup=[true|false] missing R-group definition

 :selfreference=[true|false] self reference errors in R-group definitions

 ringstrainerror detect small rings with trans or cumulative

 double bonds, or triple bond

 solvent detect common solvents appearing

 by a main component

 staratom detect star atoms

 stereocarebox detect stereo search markers on double bonds

 straightdoublebond detect undefined double bond stereo layout

 substructure:[smarts] detect the given SMARTS structure

 as a substructure in the original molecule

 unbalancedreaction detect reactions with orphan atoms

 unusedrgroup (deprecated) use unusedrgroupreference

 unusedrgroupreference detect unused R-group definitions

 valence (deprecated) use valenceerror

 valenceerror detect valence errors

 valenceproperty detect atoms with all or specified

 valence properties

 :defaultvalence=[true|false] default valence properties

 :nondefaultvalence=[true|false] non-default valence properties

 wedge (deprecated) use wedgeerror

 wedgeerror detect incorrect wedge bonds

 wigglybond detect wiggly bonds on chiral centers

 wigglydoublebond detect non_stereo double bonds with wiggly

 representation connected to a double bond

Avaliable fixer actions: structurecheck -hf

Valid fixer actions (strings) are:

 addchiralflag add chiral flag to the molecule

 aliastoatom remove aliases from atoms

 aliastocarbon (deprecated) use converttocarbon

 aliastogroup convert atoms with aliases to abbreviated groups

 if the alias is recognized

 clean calculate 2D coordinates

 clearabsstereo (deprecated) use removeinvalidchiralflag

 contractgroup contract all abbreviated groups

 converttoelementalform convert isotopes into elemental atoms

 converttocarbon remove alias values from atoms and

 convert the atom to a carbon

 converttoionicform convert covalent counterions to ionic form

 converttometalloceneform convert non-standard metallocene representations

 converttosingle (deprecated) use converttosinglebond

 converttosinglebond convert faulty bonds to single bonds

 converttowigglydoublebond convert non-stereo double bond represented by

 crossed double bond to wiggly bond representation

 into coordinated multicenter representation

 crosseddoublebond convert non-stereo double bond represented by

 wiggly bond to crossed double bond representation

 crossedtowiggly (deprecated) use converttowigglydoublebond

 dearomatize convert aromatic rings into Kekule form

 expandgroup expand all abbreviated groups if it is possible

 fixmetallocene converts metallocenes to coordinative multicenter layout

 fixrgroupattachment add missing attachments points to members

 with single location

 fixunusedrgroups delete unreferenced R-group definitions

 fixvalence correct valence problem by removing hydrogens

 or setting charges

 mapmolecule add atom maps to each atom of the molecule

 mapreaction add atom maps to the reaction

 neutralize remove charges from the molecule

 partialclean recalculate parts of the atom coordinates for 2D layout

 pseudotogroup convert pseudo atoms to abbreviated groups

 if pseudo label is a known abbreviated group

 rearomatize dearomatize the molecule and aromatize it again

 removealias remove alias values from atoms

 removeatom remove the problematic atoms from the molecule

 removeatommap remove atom map numbers

 removeatomqueryproperty remove atom query properties

 removeatomvalue remove atom values

 removeattacheddata remove non-excluded attached data from atoms

 removebond remove problematic bonds from the molecule

 removeexplicith remove explicit hydrogens

 removeinvalidchiralflag remove the chiral flag

 removeradical convert radicals to non_radical atoms

 removestereocarebox remove stereo search markers from double bonds

 removevalenceproperty remove valence properties from atoms

 removezcoordinate set the z-coordinates of atoms to zero

 ungroup ungroup all abbreviated groups

 wedgeclean recalculate the orientation of wedge bonds[](#src-1806523)

Usage

structurecheck -c <config file> -m [mode] [<options>] [input list]

The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string.

structurecheck -c config.xml

or

structurecheck -c "atomqueryproperty"

Parameter -m or --mode specifies the operation mode. The following operation modes are available:

  • check (default): searches for errors;
    structurecheck -c config.xml -m check
    
  • fix: fixes automatically fixable errors.
    structurecheck -c config.xml -m fix
    

Note : When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).

Note : The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).

Input

structurecheck accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:

  • input file(s),
  • input string(s), or
  • SMILES (default).
    structurecheck -c config.xml -m check input.mrv
    

Note : If neither the input file nor the input string is specified, the standard input (console) will be read.

structurecheck -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"

Output

structurecheck's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:

  • single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)
  • separated : valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).

    • If --discarded parameter is omitted, molecules with invalid structures are written to standard output;
    • If --output parameter is omitted, molecules with valid structures are written to standard output;
      1
       **Note** : The indication of `--output` or `--discarded` parameter is mandatory. If none of these parameters are defined, the program stops.
      
  • accepted : only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case)
  • discarded : only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)

The report of structure checking can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.

Note : Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.

Usage examples

Below you can find the short descriptions of some examples.If you want to check, fix, or filter structures in evaluate or JChem Cartridge, find examples here.

  1. structurecheck -c "metallocene"
    

    Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);

  2. 1
    2
    3
    ```no-highlight
    structurecheck -c "bondLength" in.sdf
    ```
    

    Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

  3. structurecheck -c "isotope->converttoelementalform" in.sdf Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

  4. 1
    2
    3
    ```no-highlight
    structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
    ```
    

    Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;

  5. 1
    2
    3
    ```no-highlight
    structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf
    ```
    

    Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.

    Note : The format of both outputs is SMILES(!) as --format (-f) is not defined;

  6. 1
    2
    3
    ```no-highlight
    structurecheck -c config.xml -m fix -t separated -d discarded.sdf
    ```
    

    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and writes molecules with valid structures to the standard output (console);

  7. 1
    2
    3
    ```no-highlight
    structurecheck -c config.xml -m fix -t discarded in.sdf
    ```
    

    Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and omits molecules with valid structures.