jcsearch Command Line Tool

    Contents

    Introduction

    The jcsearch program is a command-line interface of the JChem chemical structure search. It is able to perform substructure, superstructure, full (formerly called exact), full fragment( formerly exact fragment), similarity, and duplicate (formerly perfect) searches, as well as match counts on the specified query and target molecules. These molecules can be specified as filenames, SMARTS/SMILES strings, or database tables (target only). A number of different molecular file formats are supported. Refer to the JChem Query Guide for a detailed description of search options and query features.

    Note that the R-group decomposition functionality has been moved to a different script, the R-group Decomposition documentation contains specific information on this subject with examples.

    Usage

    For correct behavior, please prepare the usage of the jcsearch script or batch file as described in Preparing the Usage of JChem Batch Files and Shell Scripts .

    The program should be invoked in one of the following forms:

           jcsearch [options] [files...]
       or  jcsearch [options] DB:[table name]

    With no file, or when file is -, it reads the standard input. When DB is specified, search is done in the database, using connection information saved by other JChem programs (e.g. jcman)

    Options:

    -h         this help message
    -H         help on output file formats
    -q query   SMARTS string or name of file that contains the query structure(s)
                 (More than one can be specified in non-database mode. Please see
                  options --and and --or.) For a detailed description about how
                  to formulate queries, see the JChem Query Guide.
                  In case of -t:d or --tautomer expects SMILES instead of SMARTS.
    -t:<type>  search type.
          -t:s                         substructure search (default)
          -t:f                         full structure search
          -t:ff                        full fragment search
          -t:d                         duplicate search
          -t:i[:dissim_threshold]      similarity search
                                         (in case of mrv or sdf output, the
                                         dissimilarity value is stored as molecule
                                         property)
          -t:u                         superstructure search
                                         (default for query tables)
          -t:c                         count all hits
    --hitColoring                  colors the hits depending on search type if the
                                   output format is MRV.
    --hitColor                     in case of hitColoring, specify color of hit
                                   atoms and bonds
                                       examples: "red", "blue", "/display/docs/jcsearch-command-line-tool.md#00FF00" (green)
    --hitHomologyColor             in case of hitColoring, specify color of homology
                                   hit atoms and bonds
                                       examples: "red", "blue", "/display/docs/jcsearch-command-line-tool.md#00FF00" (green)
    --nonHitColor                  in case of hitColoring, specify color of non-hit
                                   atoms and bonds
                                       examples: "red", "blue", "/display/docs/jcsearch-command-line-tool.md#00FF00" (green)
    --nonHitColor3D                in case of hitColoring in 3D molecules, specify
                                   color of non-hit atoms and bonds
                                       examples: "red", "blue", "/display/docs/jcsearch-command-line-tool.md#00FF00" (green)
    --removeUnusedDef              in case of markush search, remove unused R-group
                                   definitions. Default value is false.
    --markushScreening:y/n         specify if screening should be used in case of
                                   markush search. Default value is y.
    --haltOnError:y/n              specify how search should handle recoverable
                                   errors: stop or log and continue if possible.
                                   Default value is y for normal structure searches
                                   and n for markush targets.
    --markushDisplayMode:o/r/rhg   in case of markush searching and hit coloring,
                                   specify the type of the resulting molecule.
          Values:
            o   Default: The results is shown on the given target structure.
            r   Markush reduction to hit : The markush structure is reduced
                according to the hit.
            rhg Markush reduction to hit and the homology groups
                are expanded according to the matching part of the query.
                which can also be a single H atom or the empty set
    --align                      align or template based clean hits if DB option
                                 has been set, and output format is MRV.
          --align:r              rotate: if query molecule has 0 dimension, it
                                 will be cleaned in 2d for alignment.
          --align:p              partial clean (template based clean): if query
                                 molecule has 0 dimension, same as rotate.
    --similarity:s/d/o           specify which score is displayed within the
                                 result of a similarity search.
          Values:
            s   Default: similarity score is displayed.
            d   Dissimilarity score is displayed.
            o   Neither similarity nor dissimilarity score is displayed.
    --queryDisplay:n/y            specify whether query structure is displayed
                                  within the result of a similarity search.
                                  Default is n.
    --displayLabelsAndBoxes:n/y   specify whether labels and bounding boxes for
                                  the parts of the result of a similarity search
                                  (target, query, score) are displayed. Default is n.
    --queryAbsoluteStereo:y/n     All chiral atoms are absolute(y, default) or
                                  consider chiral flag(n) in case of MDL mol files
                                  w/o enhanced stereo labels. Has no effect in
                                  database mode.
    --targetAbsoluteStereo:y/n    All chiral atoms are absolute(y, default) or
                                  consider chiral flag(n) in case of MDL mol files
                                  w/o enhanced stereo labels. Has no effect in
                                  database mode.
    --DBAbsoluteStereo:T/C/A      In database mode, sets the above two
                                  AbsoluteStereo flags.
                                    T:(default) as set for table in database.
                                    C: always check chiral flag(false)
                                    A: always absolute stereo(true)
    --orderSensitive              Switches on order sensitive search
    --tautomerSearch:d/y/is/n     Tautomer search mode: d-default, y-on, is-on with
                                   ignore stereo information in tautomer regions, n-off.
                                  When set to d, tautomer search is off,
                                  except in case of duplicate search in
                                  a tautomer duplicate database table.
    --exactAtomMatching:y/n       Exact atom matching (y) or not (n). Default is n.
                                  (Deprecated.)
    --exactQueryAtomMatching:y/n  Exact query atom matching (y) or not (n).
                                  Default is n (except in case of duplicate search).
    --exactBondMatching:y/n       Exact bond matching (y) or not (n).
                                  Default is n (except in case of duplicate search).
    --exactRadicalMatching:y/n    Exact radical matching (y) or not (n).
                                  Default is n. (--radical is preferred instead.)
    --exactIsotopeMatching:y/n    Exact isotope matching (y) or not (n).
                                  Default is n. (--isotope is preferred instead.)
    --exactChargeMatching:y/n     Exact charge matching (y) or not (n).
                                  Default is n. (--charge is preferred instead.)
    --charge:d/e/i                Charge matching mode: d-default,
                                  e-exact, i-ignore
                                  --charge:i forces --implicitHMatching:i in case of
                                  duplicate search
    --isotope:d/e/i               Isotope matching mode: d-default,
                                  e-exact, i-ignore
    --radical:d/e/i               Radical matching mode: d-default,
                                  e-exact, i-ignore
    --valence:d/i                 Valence matching mode: d-default, i-ignore
    --vagueBond:n/h/1/2/3/4       Vague handling of bond types:
                                   n - off
                                   h - half: handling of certain 5-membered ambiguous
                                       aromatic rings like [C,N]1C=CC=C1 (default from version 15.9.14)
                                   1 - handling of certain 5-membered ambiguous
                                       aromatic rings like [C,N]1C=CC=C1 and 1-atom-long
                                       aromatic ring ligands and bridging bonds
                                       between two aromatic rings(default in versions prior to 15.9.14)
                                   2 - all single and double ring bonds
                                       match aromatic
                                   3 - all single and double bonds
                                       match aromatic 
                                   4 - ignore all bond types
    --completeHG:y/n              Sets if only such structures can match on a
                                  homology group that form an entire group. (e.g.
                                  alkyl can't match on a cycloalkyl). default:y
    --chekSpHyb                   Switch on sp hybridization checking.
    --mix:d/i                     Handling of com, mix and for brackets: d-default,
                                    i-ignore
    --polymer:d/i                 Handling of polymer brackets: d-default,
                                    i-ignore
    --endGroupMatching:y/n        Polymer end groups must match: y-yes,
                                    n-no (default: yes)
    --transformMonomer:y/n        Polymer in their source based representation
                                  are transformed to structure based : y-yes,
                                    n-no (default: yes)
    --phaseShift:y/n              Polymers match the phase shifted variant:
                                  y-yes, n-no (default: yes)
    --copolymerMatching:y/n       Polymers in copolymers can only be matched by
                                  copolymers: y-yes, n-no (default: no)
    --homologyNarrowTranslation:n/a/m  Query homology pseudo atoms are matching
                                  on the represented group or only on pseudo:
                                  n-none, a - all, m - marked atoms only
                                  (default: none)
    --homologyBroadTranslation:n/a/m  If specific atoms can match target
                                  homology atoms:
                                  n-none, a - all, m - marked atoms only
                                  (default: none)
    --doubleBondStereo:N/M/A      Double bond stereo Matching mode:None/Marked/All
                                  Default is M.
    --stereoSearchType:s/i/e/d/a  Sets the stereo search type.
                                  Possible values:
                                    s - stereo specific searching (default)
                                    i - ignore stereo
                                    e - exact stereo
                                    d - diastereomer search
                                    a - enantiomer search
    --stereoModel:l/g/c           Sets the used stereo model (for tetrahedral and
                                  double bond stereo). Possible values:
                                  l - local, g - global, c - comprehensive
    --ignoreTetrahedralStereo:y/n Option for ignoring tetrahedral stereo
                                  during searching: y-yes ignore, n-no
                                  (default: no)
    --ignoreDoubleBondStereo:y/n  Option for ignoring double bond stereo
                                  during searching: y-yes ignore, n-no
                                  (default: no)
    --ignoreCumuleneOrRingCisTransStereo:y/n   Option for ignoring cumulene or ring 
                                  cis-trans stereo during searching: 
                                  y-yes ignore, n-no (default: yes)
    --ignoreAxialStereo:y/n       Option for ignoring axial stereo
                                  during searching: y-yes ignore, n-no
                                  (default: yes)
    --ignoreSynAntiStereo:y/n     Option for ignoring syn-anti stereo
                                  during searching: y-yes ignore, n-no
                                  (default: yes)
    --reactionUnpairedMap:All/unpairedOnly   Option for matching unpaired maps in
                                             reaction search:
                                       All(default): match to any atom map,
                                       unPairedOnly: match to unpaired map only.
    --HCountMatching:G/E/A        Hydrogen count query property interpretation.
          Values:
            G    (greater or equal, mdl behavior) target atom must have H-s
                 greater or equal to query H-s, in excess of explicit H-s.
                 H0 means no extra H other than explicitly drawn.
            E    (equal, daylight behavior) target atom must have H-s equal to
                 H count number.
            A    automatically determine whether G or E should be used, from the
                 query source. (smiles and smarts source: E, all other: G).
    --implicitHMatching:d/y/n/i   Describes the matching of implicit and
                                  explicit hydrogens.
          Values:
            d   Default: its value is y in almost every cases.
                There is only one exception: its value is n in case of duplicate
                search against a query table in a database.
            y   Implicit and explicit hydrogens can match. In case of duplicate
                 search, the sum of implicit and explicit hydrogens of the query atom
                 and the sum on the matched target atom must equal.
            n   An explicit hydrogen matches only on another explicit hydrogen. The
                number of implicit hydrogens (of the matching atoms) are not checked.
            i   Implicit and explicit hydrogens are ignored. Hydrogens are excluded
                from the matching.
                For a more detailed explanation see:
                 Search options apidoc
    --ssrType:s/c   Describes the set of smallest rings to use for atom
                                  property calculations.
          Values:
            s   Smallest set of smallest rings(SSSR), may vary depending on atom orders.
            c   Complete set of smallest rings(CSSR).
    --keepQueryOrder              Does not rearrange the atoms of the query which
                                  is done to achieve best search performance.
    --markush:n/y                 Disable/enable special handling of
                                  Markush targets. Default is n.
                                  Enabling requires special license.
    --hitIndexType:m/i            For Markush targets returns hits for the
                                  original Markush diagram (m - default) or for the
                                  inner compiled representation (i)(See --allHits).
    --hitOrdering:n/g             Hit ordering type for undefined R-atom
                                  atom-group matches.
                                   Possible values:
                                   n - none (default),
                                   g - order hits by R-atom matches processed
                                       in the order of R-group numbers:
                                       1. heavy group, 2. H atom, 3. empty group
    --optimizeQueries:y/n         Tries to speed up search when query molecule
                                  contains special query features (atom lists,
                                  bond lists) Default is y.
    --distinctFirstAtomMatching:n/y   Disable/enable special findAll algorithm.
                                  If set, the hits must have different first atoms.
                                  Default is n.
    --attachedDataMatch           Describes whether attached data
                                  is compared.
          Values:
            i   Default: ignores attached data when checks matching.
            g   general: if attached data is present in query, it must be
                present in target as well.
            e   exact: existing attached data must match
                both in query and target.
    --attachedDataMatchPrefixes   Comma separated list of name prefixes
                                  (of attached data labels), that will be
                                  compared. When not set or set to empty
                                  string all attached data is checked.
                                  Effective only when attachedDataMatch
                                  is not set to 'i'.
    --timeoutLimitMilliseconds    The search timeouts reaching this number
                                   of milliseconds. Setting to -1 means no
                                   timeout (Default 120000)
    --exhaustiveModeLimit         Upon reaching this number of steps, the
                                   search switches to exhaustive mode from
                                   fast mode. Setting to -1 means never.
                                   (Default -1)
    
    --undefinedRAtom:g/gh/ghe/a/u     specify the matching of an undefined R-atom
                                      in the query. Effective only when
                                      --exactQueryAtomMatching is not set.
          Values:
            g   Default: Undefined R-atom matches a group of
                one or more connected atoms in target,
                including at least one heavy atom.
            gh  Undefined R-atom matches a group of
                one or more connected atoms in target,
                which can also be a single H atom.
            ghe Undefined R-atom matches a group of
                one or more connected atoms in target,
                which can also be a single H atom or the empty set
                (empty set match is allowed for isolated or
                one-attachment R-atoms only).
            a   Undefined R-atom matches any single atom in target.
            u   Undefined R-atom matches only an undefined R-atom in target.
    --bridgingRAllowed:n/y        Forbid/allow different R-atoms matching
                                  the same group. Default is n.
    --RLigandEqualityCheck:y/n    Switch on/off the requirement that R-atoms
                                  with the same R-group ID should match ligands
                                  with the same structure. Default is y.
    --maxResults:<n>              Limits the number of molecules returned.
    -f format  output format (default: smiles). Run jcsearch -H for details.
          -f :T<column-name> write the value of the specified column of
                             matching targets
          -f :Tname          write the molecule names of matching targets
          -f :M<column-name1:...:column-namen>
                             write the values of the specified columns
                       together with the structure of matching targets
    -o file    write output to file
    -s SMILES  read input from SMILES string
    -v         verbose
    -vv        very verbose, stack trace on error
    -0         skip coordinate calculation for SMILES input
    -d         use Daylight-type aromatization (Huckel-rule) instead of
               the standard one. (This flag overrides flags -S and --standardize!)
    -2[:[On][e]]  2D coordinate calculation (useful if the input is SMILES)
          -2      coordinate calculation with default options (O1)
          -2:O0   no optimization    -2:O1  optimize if needed
          -2:O2   optimize           -2:e   make double either (cis/trans) bonds
    -n         List non-hits. For using with multiple targets, see options --and
               and --or.
    --and      If two or more queries are present, all are required to match.
               (Default) For DB targets, only the first query is considered.
               If used together with option -n , a hit is returned if none of the
               query molecules match.
    --or       If more than one queries are present, at least one is required to
               match. 
               If used together with option -n , a hit is returned if at least
               one query molecules does not match.
    --allHits  Instead of checking the existence of matching, all matchings of
               the query molecule(s) are reported.
               Symbols used in hit arrays in place of specific query atoms:
                R    - R-group
                M    - multicenter
                U    - unmapable (e.g. polymer star atom)
                LP   - lone pair
                E    - R-atom matching the empty set
                EXCL - excluded query atom
    -e, --expression <expression | file>   A Chemical Terms filtering expression
                                           for filtering hits. For syntax, see:
                                           Filtering expression syntax
    -c, --config <file>                    User defined configuration XML file for
                                           Chemical Terms
    -F <SQL statement>                     SQL query for filtering. The result should
      or --filterQuery <SQL statement>     contain the cd_id values. For syntax, see
           filterQuery documentation
    --ignoreCTExceptions:n/y If set to y, only syntactical exceptions
                             will be thrown during search. Those molecules
                                     that return exception during evaluation
                             will be left out from hit list. Default is n.
    -S, --standardize <file/string>      standardize query and target
                                         according to configuration file/string
    -g, --ignore-error                   continue with next molecule on error

    Filtering expression syntax

    Option -e or --expression requires an additional parameter, a filtering expression formulated in ChemAxon's Chemical Terms language. (It can also be the name of a file containing the filtering expression.) Only targets (and hits in case of the --allHits option) satisfying the filtering expression are reported. Note that the filter expression applies to all query molecules if more than one are specified (in case the filter expression uses the query molecule at all).

    The expression syntax is described in the Chemical Terms Language Reference. Search specific functions contained in the search context provide access to the query and the target molecules, the search hit array and its elements:

    • mol(), target(): both refer to the search target molecule

    • query(): refers to the search query molecule

    • m(int i): refers to the query atom index with atom map i

    • hit(), h(): both refer to the search hit array

    • hit(int i), h(int i): both refer to the i-th element of the search hit array, this is the target atom index matching the query atom with atom index i

    • hm(int i): refers to the target atom index matching the query atom with atom map i (shorthand for h(m(i)))

    The default input molecule is the target molecule (e.g. mass() is the same as mass(target()), both refer to the molecule mass of the target molecule).

    In most cases the function and plugin definitions provided by the built-in evaluator.xml are sufficient, but it is possible to specify a user-defined configuration xml in the --config parameter. The user-defined configuration is added to the definitions contained in the built-in evaluator.xml. The syntax is described in the Chemical Terms Language Reference, which includes a set of search filter examples . The short reference tables give a summary of the functions and plugins provided by the built-in configuration.

    Examples

    1. Searching chlorobenzol in a SMILES file and sending the results to the standard output in SMILES format:

       jcsearch -q "c1ccccc1Cl" -f smiles input.smi
    2. Searching molecules with chlorobenzol and bromine at the same time. Output: smiles and molecule name (which is stored in input.smi, separated from the smiles by spaces .)

       jcsearch -q "c1ccccc1Cl" --and -q "Br" -f smiles:Tfield_0 input.smi
    3. Searching chlorobenzol in an SDfile file and writing the result (structures and all other data) into another SDfile:

       jcsearch -q "c1ccccc1Cl" -f sdf -o hits.sdf input.sdf
    4. Like the above, but reading the query from a molfile and displaying the results using mview:

       jcsearch -q clbenz.mol -f sdf input.sdf | mview -f ID -
    5. Like the above, but reading targets from a database table called molecules.

       jcsearch -q clbenz.mol -f sdf DB:molecules | mview -f ID -
    6. Listing atom numbers with less than -0.3 partial charge in a specific molecule.

       jcsearch --allHits -e "charge(h(0)) < -0.3" -q '[*]' '[O-]C(=O)CCCCCC(=O)CCCC([O-])=O'
    7. Listing carboxylic groups with acidic p K a value on the carboxylic OH greater than 4.

       jcsearch --allHits -e "pka('acidic',hm(1)) > 4" -q "[H][O:1]C=[O:2]" target.mol
    8. Filtering target molecules by both molecule mass and substructure search:

       jcsearch -e "mass() >= 250" -q query.mol targets.sdf
    9. Similiarity search, threshold should be between 0 (very similar) to 1 (not similar):

       jcsearch -q "CC(C)(O)C#N" input.smi -t:i:0.4
    10. Duplicate search, ignoring hydrogens:

       jcsearch -t:d --implicitHMatching:i -q Cc1c[nH]cn1 Cc1cnc[nH]1
    11. Duplicate search, ignoring charge (which in turn implies --implicitHMatching:i):

       jcsearch -t:d --charge:i -q SC1=CC=CC=C1 [SH2+]C1=CC=CC=C1