Structure Checker Command Line Application

    Structure Checker is a chemical validation tool detecting and fixing common structural errors or special features that can be potential sources of problems. structurecheck is the command-line tool of Structure Checker .

    Usage

    structurecheck -c <config file/string> [-m <mode>] [options] [input list]

    The command line parameter -c or --config is mandatory. This parameter specifies the configuration file path or a simple action string (see Creating a Configuration for more details).

    structurecheck -c config.xml ...

    or

    structurecheck -c "atomqueryproperty" ...

    Parameter -m or --mode specifies the operation mode. The following operation modes are available:

    • check (default): searches for errors;

      structurecheck -c config.xml -m check ...
    • fix: searches for errors and fixes automatically fixable errors.

      structurecheck -c config.xml -m fix ...

      Note: When a molecule import/export error occurs, the program continues to run. The error is written to the console, and the molecule is discarded from the results (i.e., the resulting output file contains less molecules than the input file).

      Note: The syntax of commands can be different under various command line shells (bash, tcsh, zsh, etc.).

    General options

    General options:
      -h, --help                         this help page
      -hc, --help-checker-actions        help page of valid checker actions
      -hf, --help-fixer-actions          help page of valid fixer actions
      -m, --mode [check|fix]             mode of the operation (default: check)
              check                      only check is executed,
                                         does not modify molecules
              fix                        fix molecules containing structure errors
                                         whenever possible
      -x                                 fix mode (deprecated, use --mode fix)
    Input options:
      -c, --config <filepath|string>     action string configuration
                                         actions separated by "..",
    Output options:
      -t, --output-type <output type>    [single|separated|accepted|discarded]
                                         set output type(default: single)
              single                     both accepted and discarded structures are
                                         written to the <output path>
              separated                  accepted structures are written to the
                                         <output path>, discarded structures are
                                         written to the <discarded path>
              accepted                   only accepted structures are
                                         written to the <output path>
              discarded                  only discarded structures are
                                         written to the <discarded path>
      -o, --output <output path>         output file (default: standard output)
      -d, --discarded <discarded path>   write molecules with structure error to
                                         a separate file (default:standard output)
      -f, --format <format>              output file format (default: smiles)
      -rf, --report-file <filepath>      write report to a file
      -rp, --report-property             write report to the property of the
                                         output
      -rt, --report-pattern <pattern>    generate pattern based report file
      -re, --report-format <format>      file format of the molecules in report
      -l, --log <filepath>               write software-error log messages to file
      -g, --ignore-errors                continue with next molecule on error
      -ic, --ignore-config-errors        ignores errors found in the configuration

    Input

    structurecheck accepts most molecular file formats as input (Marvin Documents (MRV), MDL molfile, Sdfile, RXNfile, Rdfile, SMILES, etc.). The input can be specified as:

    • input file(s),

    • input string(s), or

    • SMILES (default).

    structurecheck -c config.xml -m check input.mrv

    Note : If neither the input file nor the input string is specified, the standard input (console) will be read.

    structurecheck -c config.xml -m check "OCC(O)C1OC(=O)C(O)=C1O"

    Output

    structurecheck's output contains the file(s) of the checked/fixed molecules and optionally a report of the results. The molecules are written to the output file(s). The format of the output file(s) can be specified by the -f or --format option (default format is: "smiles"). The type of output is defined by the -t or --output-type parameter. The possible values of the output type are the following:

    • single (default): all molecules are written to the file defined by the --output parameter. If --output parameter is omitted, the result is written in the standard output (console). (--discarded parameter is ignored in this case.)

    • separated : valid and invalid molecules are written to two different files. The --output parameter defines the output file of molecules with valid structures, and the --discarded parameter defines the output file of molecules with invalid structures (or in fix mode, those which cannot be fixed automatically).

      • If --discarded parameter is omitted, molecules with invalid structures are written to standard output;

      • If --output parameter is omitted, molecules with valid structures are written to standard output;

        Note : The indication of --output or --discarded parameter is mandatory. If none of these parameters are defined, the program stops.

    • accepted : only molecules with valid structures are written to file defined by the --output parameter. If --output parameter is omitted, molecules with valid structures are written to the standard output. (--discarded parameter is ignored in this case)

    • discarded : only molecules with invalid structures are written to the file defined by the --discarded parameter. If ?-discarded parameter is omitted, molecules with valid structures are written to the standard output. (--output parameter is ignored in this case.)

    The report of structure checking can be written either to a separate file, defined by the --report-file parameter, or to the output file(s) as additional molecule property. The name of the property can be defined by the --report-property parameter.

    Note: Not all molecules with structure errors are discarded. When fix mode is selected, molecules with automatically unfixable errors will be discarded only.

    Usage examples

    Below you can find the short descriptions of some examples.If you want to check, fix, or filter structures in evaluate or JChem Cartridge, find examples here.

    1. structurecheck -c "metallocene"

      Executes a check with configuration metallocene on the molecule(s) defined in the standard input, and writes the result to the standard output (console);

    2. structurecheck -c "bondLength" in.sdf

      Executes a check with configuration bondLength on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

    3. structurecheck -c "isotope->converttoelementalform" in.sdf

      Executes a check with configuration isotope->converttoelementalform on the molecule(s) defined in the in.sdf file, and writes the result to the standard output (console);

    4. structurecheck -c "aromaticity..valence" -m fix -f sdf -o out.sdf in.sdf

      Executes a fix with configuration aromaticity and valence on the molecule(s) defined in the in.sdf file, and writes the molecules with valid structures (including automatically fixed molecules) in sdf format to the out.sdf output file;

    5. structurecheck -c config.xml -t separated -o out.sdf -d discarded.sdf

      Executes a check with configuration contained by the config.xml, and writes the molecules with valid structures to out.sdf, and writes the molecules with invalid structures to discarded.sdf.

      Note: The format of both outputs is SMILES(!) as --format (-f) is not defined;

    6. structurecheck -c config.xml -m fix -t separated -d discarded.sdf

      Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and writes molecules with valid structures to the standard output (console);

    7. structurecheck -c config.xml -m fix -t discarded in.sdf

      Executes a fix with configuration contained by the config.xml, and writes the molecules with invalid structures todiscarded.sdf, and omits molecules with valid structures.