Atom lists proved to be useful tools for creating query structures with variable atoms. JChem provides a similar variability feature of functional groups or other substructures in queries for molecule or reaction targets (tables) through the use of R-groups.
If you are interested in searching combinatorial Markush library targets (tables) described by R-group notation, see this following section.
Description of the use of undefined R-atoms in query structures can be found in this section.
An R-group query structure consists of three components, a root structure (often referred to as scaffold ), a set of R-group definitions , and R-group conditions . The root structure contains the portion of the query structure that does not vary among the structures retrieved. R-groups are attached as substituents to the root and their sites are marked with R1, R2, R3, etc. symbols. It is possible to attach multiple R-groups to one root, even to a single atom of the structure. One R-group can be attached multiple times to the same root, but it does not mean that all these attachments should be filled by the same definition (see occurrence conditions below for further information). An R-group without definitions is called undefined.
R-group definitions are variable lists of ligands connected to specific positions of the root structure by their attachment points.
The following figure displays an R-group query structure (root structure + R-group definitions) without R-group conditions.
R-logic conditions can be set by using by Marvin JS or by Marvin Sketch. As for Marvin JS, see the description of the necessary steps to be executed here.
As for Marvin Sketch, the necessary steps are described here below.
Conditions of R-group search can be set on the R-logic window of Marvin Sketch (available from menu: Structure > Attribute > R-logic...).
When in Marvin Sketch, the View > Misc > R-logic is switched on, R-logic settings are displayed, too. By default, when R-groups are defined in the structure, and Structure > Attributes > R-logic has its default settings, R1 *
, R2 *
, etc. are displayed.
Modification of the R-logic settings to
results the following display.
The occurrence condition defines the number of R-group sites to be occupied. For example, occurrence designation >0
for R1 specifies that the target molecule must contain at least one of the R1 substituents listed in the R1 group definition on its corresponding atom. If this condition is not specified (the default case), all occurrences are considered as mandatory ligands. (However, different occurrences of the same R-group can be substituted with different definitions.)
Examples of valid R-group occurrence specifications:
3 : Exactly 3
3 : More than 3
<3 : Less than 3
2-5 : 2 to 5
The occurrence can also be specified as a comma separated list, in which there is OR relation between the elements. Example: "0,2-5,>6
" means the specified R-group may occur zero, two to five, or more than six times (i.e., any occurrence but 1 and 6 is accepted).
Note: If the smallest number of the specified possible occurrences of an R-group is equal to the number of corresponding R-atoms actually drawn in the structure, then it is treated the same way as the unspecified condition: all substitutions are mandatory.
If conditions for an R-group have to be satisfied only when conditions of another R-group are satisfied, use the If/Then conditions. For example, If R1 then R2
means that if the conditions for R1 are satisfied, then the conditions for R2 must also be satisfied. If the conditions for R1 are not satisfied, the conditions for R2 are ignored. This If/Then condition implies that the molecule may be retrieved even though R1 is not satisfied.
If the RestH condition is set for an R-group, the hit molecules do not contain ligands on that atom other than hydrogen or those specified as R-group. If RestH condition is false, then R-group sites can contain any additional non-hydrogen ligands as well.
Table 1. R-group query structures
target | |||
---|---|---|---|
query | default R1 R2 | ||
R1 >0 R2 >0 | |||
R1 >1 R2 * | |||
R1 * R2 >0 | |||
R1 *, RestH R2 1 | |||
default | |||
if R1 then R3 | |||
if R2 then R3 |
R-group definitions may also contain other R-groups, that is, R-groups may be nested (to any depth). See an example below:
In nested R-groups, the occurrence conditions are interpreted the same way as in the root, but separately and locally for each definition of each R-group. See examples below:
Query | Hits | Non-hits | |
---|---|---|---|
The following restrictions apply to R-group queries.
R-logic conditions allowing optional occurrence are not supported for R-groups with multiple connections (attachment points).
R-groups can have more than two connections (attachment points) only if
the query does not have R-logic conditions;
the number of enumerates does not exceed 100.
RestH condition is not supported for nested R-atoms connected to an attachment point in the definition of another R-group.
Undefined R-atoms are not supported if the query has a defined R-group with R-logic conditions.
Undefined R-atoms cannot be directly connected to other R-atoms.
Undefined R-atoms cannot be in nested position.
See further restrictions affecting the use undefined R-atoms.
Query properties are not supported on R-atoms.
rb* (ring bonds as drawn) query property is not supported on the atoms of R-group definitions. However, you can use an explicit number with the rb query property (e.g., rb2).
For a description of R-group decomposition, see this separate document.