Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurecheck
is the command-line tool of Structure Checker .
structurecheck -c <config file/string> [-m <mode>] [options] [input list]
The command line parameter -c
or --config
is mandatory. This parameter specifies the configuration file path or a simple action string
(see Creating a Configuration for more details).
structurecheck -c config.xml ...
or
structurecheck -c "atomqueryproperty" ...
Parameter -m
or --mode
specifies the operation mode. The following operation modes are available:
check
(default): searches for errors;
structurecheck -c config.xml -m check ...
fix
: searches for errors and fixes automatically fixable errors.
structurecheck -c config.xml -m fix ...
Note: When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).
Note: The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).
General options:
-h, --help this help page
-hc, --help-checker-actions help page of valid checker actions
-hf, --help-fixer-actions help page of valid fixer actions
-m, --mode [check|fix] mode of the operation (default: check)
check only check is executed,
does not modify molecules
fix fix molecules containing structure errors
whenever possible
-x fix mode (deprecated, use --mode fix)
Input options:
-c, --config <filepath|string> action string configuration
actions separated by "..",
Output options:
-t, --output-type <output type> [single|separated|accepted|discarded]
set output type(default: single)
single both accepted and discarded structures are
written to the <output path>
separated accepted structures are written to the
<output path>, discarded structures are
written to the <discarded path>
accepted only accepted structures are
written to the <output path>
discarded only discarded structures are
written to the <discarded path>
-o, --output <output path> output file (default: standard output)
-d, --discarded <discarded path> write molecules with structure error to
a separate file (default:standard output)
-f, --format <format> output file format (default: smiles)
-rf, --report-file <filepath> write report to a file
-rp, --report-property write report to the property of the
output
-rt, --report-pattern <pattern> generate pattern based report file
-re, --report-format <format> file format of the molecules in report
-l, --log <filepath> write software-error log messages to file
-g, --ignore-errors continue with next molecule on error
-ic, --ignore-config-errors ignores errors found in the configuration
structurecheck
accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:
input file(s),
input string(s), or
SMILES (default).
structurecheck -c config.xml -m check input.mrv
Note : If neither the input file nor the input string is specified, the standard input (console) will be read.
structurecheck -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"
structurecheck
's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f
or --format
option (default format is: "smiles"). The type of output is defined by the -t
or --output-type
parameter. The possible values of the output type are the following:
single (default): all molecules are written to the file defined by the --output
parameter. If --output
parameter is omitted, the result is written in the standard output (console). (--discarded
parameter is ignored in this case.)
separated : valid and invalid molecules are written to two different files. The --output
parameter defines the output file of molecules with valid structures, and the --discarded
parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).
If --discarded
parameter is omitted, molecules with invalid structures are written to standard output;
If --output
parameter is omitted, molecules with valid structures are written to standard output;
Note : The indication of --output
or --discarded
parameter is mandatory. If none of these parameters are defined, the program stops.
accepted : only molecules with valid structures are written to file defined by the --output
parameter. If --output
parameter is omitted, molecules with valid structures are written to the standard output. (--discarded
parameter is ignored in this case)
discarded : only molecules with invalid structures are written to the file defined by the --discarded
parameter. If ?-discarded
parameter is omitted, molecules with valid structures are written to the standard output. (--output
parameter is ignored in this case.)
The report of structure checking can be written either to a separate file, defined by the --report-file
parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property
parameter.
Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.
Below you can find the short descriptions of some examples.If you want to check, fix, or filter structures in evaluate
or JChem Cartridge, find examples here.
structurecheck -c "metallocene"
Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);
structurecheck -c "bondLength" in.sdf
Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf
file, and writes the result to the standard output (console);
structurecheck -c "isotope->converttoelementalform" in.sdf
Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf
file, and writes the result to the standard output (console);
structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf
Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf
file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf
format to the out.sdf
output file;
structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf
Executes a check with configuration contained by the config.xml
, and writes the molecules with valid structures to out.sdf
, and writes the molecules with invalid structures to discarded.sdf
.
Note: The format of both outputs is SMILES(!) as --format (-f)
is not defined;
structurecheck -c config.xml -m fix -t separated -d discarded.sdf
Executes a fix with configuration contained by the config.xml
, and writes the molecules with invalid structures todiscarded.sdf
, and writes molecules with valid structures to the standard output (console);
structurecheck -c config.xml -m fix -t discarded in.sdf
Executes a fix with configuration contained by the config.xml
, and writes the molecules with invalid structures todiscarded.sdf
, and omits molecules with valid structures.