R-group Decomposition User's Guide¶
Contents¶
Introduction¶
R-group decomposition is a special kind of substructure search that aims at finding a central structure - scaffold - and identify its ligands at certain attachment positions. The query molecule consists of the scaffold and ligand attachment points represented by R-groups. These R-groups are simple R-group atoms without R-group definitions. An example query structure is shown below:
![]() |
|---|
Note, that there are two R1atoms referring to symmetrical ligand positions. By default, this means that the matching ligands should be identical. You can change this behavior by setting the --RLigandEqualityCheck:nparameter, in which case the same RGroup ID is only used to denote symmetrical positions on the scaffold and different ligands at these positions are also accepted.
Ligand attachments are not allowed at implicit H atoms, that is, each decomposition corresponds to a full fragment match.
To add R-atoms in place of implicit H atoms, use the --query-modificationquery transformation option. Applying this to the query above, you get the following:
![]() |
|---|
R-atoms with any-bond attachments are automatically added if the query does not contain R-atoms at all.
Take the following targets:
![]() |
|---|
Decompositions using the different query options (no modification, add R-groups) are shown below. By default, decomposition is generated for the first hit only. To process all hits, set the --allHits option.
Standardization may be necessary, only the aromatization task is preformed by default. Substructure search requires aromatized query and target structures and also assumes that the same functional group representation is used in the query and the target molecules (e.g. nitro-groups, also think of tautomer and mesomer forms). Standardization can be specified in the --standardize option.
The following examples show some decomposition tables that can be obtained by running the rgdecomp command line tool or directly using the R-group Decomposition API. Atom color codes are set in atom sets if the output format is MRV, or defined in Colors.ini
Colors.ini
and coloring data is stored in the molecule property "DMAP" if the output format is SDF. To get a nice table output, we specify the number of MView table columns in the --c parameter of MView. Alternative decomposition output styles for the above query and targets are shown later. To run these examples, refer to the preparation instructions.
- Decompositions with original query:
Note, that if SDF output format is chosen then we keep the default any-atom attachment point markers, since only at most 2 R-group attachments can be saved in SDF format.
You can also pipe the output of rgdecomp directly to mview under Linux/Unix systems:
Note, that we have to set aromatization since our molecules are in dearomatized form (SDF). To store the results in dearomatized form, we have to specify dearomatization in the output format: -f sdf:-a. By default, attachment points are denoted by newly added any-atoms, since this can be stored in any output format. We have chosen R-group attachment representation instead by setting -a P. We use the -k option to use the original coordinates, since our structures are already aligned. Alignment with symmetrical queries may have unexpected results.
Decompositions with original query, R-group attachment points
-
- Decompositions with R-grouped query, R-groups attached by single bonds:
1 2 3 4 5 6 7 8 9 10 11
``` rgdecomp -k -m Rs -q query.mol targets.sdf --bridgingRAllowed:y -a P -f mrv:-a -o resultR.mrv mview --gridbag -c 6 -r 4 resultR.mrv ``` ``` rgdecomp -k -m Rs -q query.mol targets.sdf --bridgingRAllowed:y -f sdf:-a -o resultR.sdf mview --gridbag -t DMAP -p Colors.ini -c 6 -r 4 resultR.sdf ``` <a name="src-1806750-safe-id-ui1ncm91cerly29tcg9zaxrpb25vc2vyj3nhdwlkzs1yzxn1bhrzug"></a> **Decompositions with R-grouped query, any-atom attachment markers**
- Decompositions with R-grouped query, R-groups attached by single bonds:
![]() |
|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
In the first case R2 matches H-ligands:
![]() |
|---|
while with hit ordering R2 is forced to match the heavy-atom ligand:
![]() |
|---|
![]() |
|---|
Note, that the second target is not covered by this Markush structure, since it does not have a decomposition without bridging R-atom matching. (Bridging R-groups cannot be handled by enumeration.)
Usage¶
Prepare the usage of the
script or batch file as described in Preparing and Running Batch Files and Shell Scripts.
Search options are identical to that of jcsearch .
In this section we describe the R-group decomposition specific command line options.
-
Query and target standardization can be specified in the
--standardizeoption: the standardization configuration is given either directly in a simple action string or as a configuration XML file path. Note, that substructure search requires aromatized molecules, therefore-S 'aromatize'is the default. You can skip standardization by setting-S ''. -
We can require query modification by setting the
--query-modificationoption:Rafor adding R-groups: allows and stores all scaffold-ligand any-bond attachments
-
Rsfor adding R-groups: allows and stores all scaffold-ligand single-bond attachmentsIf the query has no R-group nodes then the
Ramodification is applied automatically.
-
We can set the attachment symbols by the
--attachment-symboloption:N: none
P: R-group attachment
A: any-atom (default) - an any-atom is attached to the attachment atom representing the connection to the scaffold
M: atom map representing the corresponding R-group index
L: an atom label representing the corresponding R-group ID as: 1, 2, 3, ...
-
R: an atom label representing the corresponding R-group ID as: R1, R2, R3, ...Note, that the default any-atom representation and atom maps can be exported in all molecule file formats, while R-group attachment is not available in SMILES and atom labels are only supported in MRV.
-
We can set the output format in the
--formatoutput option. The output is- a tab-separated SMILES table if the output format is omitted, including a target ID column if the ID field is specified in the
--idparameter
-
a molecule file with a series of query, R-group, target and ligand molecules otherwise, which can be seen as a colored molecule table if read in
mviewwith appropriate options defining the color palette, the color symbol molecule property name and the number of table columnsIn both cases, data included in the output can be specified in the
--styleoption (set any combination of the following letters):
H: include query header
T: include targets
-
S: include scaffoldThe default is
HT.
- a tab-separated SMILES table if the output format is omitted, including a target ID column if the ID field is specified in the
-
In case when the query contains R-group nodes with the same R-group IDs (e.g. two R1 atoms), these nodes represent identical ligand structures by default. If we set the
--RLigandEqualityCheck:noption then we allow different structures to match these nodes. In this case the identical R-group IDs only represent symmetrical attachment positions on the scaffold but have no implication for the matching target structures. -
We call bridging R-atom matching when two R-atoms match the same group of target atoms. By default, bridging R-atom matching is not allowed, it can be enabled by setting the
--bridgingRAllowed:yoption. -
The R-atom matching behavior can be set in option
--undefinedRAtom. By default, R-atoms can match heavy atom groups or hydrogens ("gh"), but empty set matching is also allowed ("ghe") for R-atoms that are added automatically if the original query did not contain R-atoms. The usefulness of this is shown in an example of empty R-atom match under the Examples section below. -
The R-atom matching behavior can be further specified in option
--hitOrdering. By default, the order of the returned hits is arbitrary in case when there are multiple hits. However, by setting--hitOrdering:gheavy group matches will be preferred in the order of R-group numbers. For example, if there are R1, R2 and R3 R-groups then the sorting algorithm will try to match R1 to a heavy group if possible, otherwise a H-atom and finally to the empty set; then the same is played with R2 and then with R3. In case when R1 and R2 are in symmetrical positions and one of them matches a heavy group while the other one matches a H-atom, it is guaranteed that it will be the R1 which matches the heavy group in the first place. The unsorted and the sorted matchings can be compared in the example of empty R-atom match under the Examples section below. -
Target and ligand alignment to query can be disabled by setting the
--keep-coordinatesparameter (meaningful for 2D output formats). This can be useful when target is already aligned and when the query is symmetrical. In these cases the alignment can give unexpected results. -
If the
--Markushparameter is specified then the decompositions are transformed to R-group definitions and added to the query. Those targets which match the query structure will be contained in the enumerations of the resulting Markush structure. For an example, see Markush generation example above.
Only one decomposition for each target - corresponding to the first search group hit - is presented in the output by default. If the rgdecomp command line option --allHits is specified, then all possible decompositions are listed.
If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export errors.
Examples¶
To run these examples:
-
The Java Virtual Machine version 1.6 or higher and JChem have to be installed on your system.
-
The
PATHenvironment variable has to be set as described in the Preparing and Running JChem's Batch Files and Shell Scripts manual. -
A command shell (under UNIX / Linux: your favorite shell, under Windows: a Cygwin shell or a Command Prompt) has to be run in the
RGroupDecomposition_filessubdirectory.In UNIX / Linux:
In Windows:
In the following examples we use the query and targets from the introduction. You can type these examples and see the results yourself in the subdirectory RGroupDecomposition_files where you can find the input files query.mol and targets.sdf.
-
SMILES table output (no
-fparameter is specified): -
The same with allowing bridging R-atom matching:
-
SMILES table output with all decompositions listed allowing the two
R1query node matching different ligands, allowing bridging R-atom matching, displaying target index in ID column: -
The same with taking ID-s from the ID molecule field:
-
SMILES table output with adding R-atoms in place of implicit H-s in query, representing attachments by atom maps, including target, scaffold and ligands in output:
-
Molecule series output in MRV format with all decompositions, allowing the two
R1query node matching different ligands, showing results in MView:Note, that we use the
-koption to disable alignment, which can give strange results in case of symmetrical queries (try this by running this example without the-koption).You can also pipe the output of
rgdecompdirectly tomviewunder Linux/Unix systems:Note, that by specifying MRV output format in the
-fparameter we automatically switch to molecule series output as default output style and also enable the storage of atom color data in atom sets that are shown with different colors in MView. We specify the number of table columns in theMViewoption-c. The decompositions are shown below:
All decompositions with --RLigandEqualityCheck:n
![]() |
|---|
Now take a query without R-atoms:
Query without R-atoms
![]() |
|---|
Take the following targets:
Targets
![]() |
|---|
With automatic R-atom addition allowing empty set matches, all of these targets will have decompositions:
The decompositions are shown below:
Decompositions with adding R-atoms automatically
![]() |
|---|
In SMILES table form:
gives the following result:
Note, that the undefined R-atom matching behavior can be set explicitly in the--undefinedRAtomoption
By setting option --hitOrdering:gthe heavy group matches in R9 and R10 are moved to the symmetrical positions R1 and R2:
The decompositions are shown below:
Decompositions with adding R-atoms automatically, hit ordering
![]() |
|---|
In SMILES table form:
gives the following result:











