Appendix

    Differences in matching Daylight and MDL formats

    We pursue compatibility with both MDL and Daylight structure searches. However, some query features have different meanings in the two systems. For this reason the interpretation of some query features depends on the query input format. Queries of type SMILES, SMARTS, cxsmiles and cxsmarts will be matched the Daylight way and all others the MDL way.

    The affected query features and their different matchings are detailed below.

    ANY and not list atoms

    In the MDL terminology, ANY atoms never match hydrogens. This also excludes plain H, deuterium, charged H, etc. However, at Daylight ANY matches isotopic and charged H, but not plain Hydrogens.

    In case of not list atoms, if H (or #1) does not appear in the excluded list, Daylight terminology behaves similarly as above: accept isotopic and charged H only. On the other hand, MDL never accepts Hydrogens for not lists. Here we chose not to comply with the MDL behavior even in the case of MDL format input to avoid misinterpretation. So in case of MDL format query all Hydrogens match to not lists. (Certainly if H atom type is included in the not list it will NOT match to H.) See examples below.

    Table 1.

    Query Targets
    images/download/attachments/5312901/diff002.png images/download/attachments/5312901/diff003.png images/download/attachments/5312901/diff004.png
    MDL Query (molfile)
    images/download/attachments/5312901/diff001.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png
    images/download/attachments/5312901/diff005.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/yes.png
    images/download/attachments/5312901/diff016.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png
    Daylight Query (SMARTS)
    images/download/attachments/5312901/diff001.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/yes.png
    images/download/attachments/5312901/diff005.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/yes.png
    images/download/attachments/5312901/diff016.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png

    H query property

    In the MDL terminology, query property H <number> means at least <number> Hydrogens in excess explicitly drawn on the query. H0 is a special case which means no Hydrogens in excess the explicitly drawn. On the other hand, at Daylight H <number> means a total of <number> Hydrogens. (Explicit and implicit.)

    Table 2.

    Query Targets
    images/download/attachments/5312901/diff015.png images/download/attachments/5312901/diff013.png images/download/attachments/5312901/diff014.png
    MDL Query (molfile)
    images/download/attachments/5312901/diff011.png images/download/attachments/5312901/no.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/yes.png
    images/download/attachments/5312901/diff012.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png
    Daylight Query (SMARTS)
    images/download/attachments/5312901/diff011.png images/download/attachments/5312901/yes.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png
    images/download/attachments/5312901/diff012.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png images/download/attachments/5312901/no.png

    Double bond stereo matching mode

    This is related to cis-trans isomerism of double bonds. As described above, there is a search option to control this: setDoubleBondStereoMatchingMode(), defaulted to DBS_MARKED. When DBS_MARKED option is set, cis/trans is only considered at marked double bonds. (An MDL query feature, also called stereo care flag. It is depicted as a square over the double bond.) However, the Daylight terminology lacks marked double bonds, they use directional bonds: / and \ instead. In order to correctly evaluate stereo SMARTS queries using the default search in case of Daylight format queries, the DBS_MARKED option considers directional bonds. (Please note that there is no special depiction of these SMARTS stereo bonds in Marvin, however the non-stereo double bonds like CC=CC are depicted by a wiggly bond ligand.)

    'D' and 's' features

    The SMARTS feature 'D' (degree) in Daylight implementation by default does not follow its description ("explicit connections"): ignores explicit H connections (but counts explicit H isotopes). This is the same semantics as the MDL feature 's' ("substitution count") offers, so in searches the two features have the same meaning.

    SMARTS feature matrix

    Supported SMARTS features

    Table 3.

    SMARTS notation Description
    cC Aromatic/Aliphatic atoms
    * Any atom
    a Aromatic
    A Aliphatic
    <n> Isotope
    H<n> Total H count
    R<n> Ring membership
    r<n> Ring size
    v<n> Valence
    X<n> Connectivity
    +/- Charge
    #n Atomic number
    @, @@ Tetrahedral chirality
    @? Chiral or unspec
    * / \ = # : ~ -,= -,: bond types
    [#6,#7,#8,#9] Atom list
    [!#6!#14!#32!#50!#82] Atom not list
    [C:1] Map
    O>>O Reaction SMARTS
    (C.C) Component level grouping
    /? ? directional bond or unspecified
    D<n> Degree
    h<n> Implicit H-count
    @ Any ring bond
    ! & ; , General logical expressions within atom and bond descriptions.
    $() Recursive SMARTS

    NOT YET supported SMARTS features

    Table 4.

    SMARTS notation Description
    @<c><n> Chirality class
    @<c><n>? Chirality class or unspec

    Molfile (MDL) query feature matrix

    Supported Molfile(MDL) query features

    Table 5.

    Generic atoms: hetero(Q), Any(A)
    Atom list
    Atom not list
    No implicit hydrogens
    Valence(v<n>)
    Charge
    Isotope
    Radical
    Atom to atom map(reactions)
    Chiral atoms
    Chiral flag of molecules
    Enhanced stereo representation(ABS AND<n> OR<n>)
    Bond types: single, double, triple, aromatic, double cis or trans, single or double, single or aromatic, double or aromatic, any
    Stereo bond types: single up, single down, single up or down
    Double bond stereo care flag
    Reactions: starting materials, products
    Reaction stereo: inversion, retention
    Reacting center
    Atom alias
    Pseudo atoms
    LP atom type
    R-group queries: up to two connections per R-group
    R-logic: occurrence range, restH, if-then
    S-groups: Super atom (abbreviated group), multiple group, mixture, component, formulation
    Bond topology: in ring, in chain, none
    Unsaturated atom
    Ring bond count(RB)
    Substitution count
    Link atom
    Polymer and attached data S-group types

    NOT YET supported Molfile(MDL) features

    Table 6.

    3D special features
    Exact change flag (reaction)
    Beilstein generics