Homology groups represent sets of homologous substructures in Markush structutes (e.g., alkyl, aryl, heterocycle, etc.). Read the user's guide about homology groups and editing their properties in MarvinSketch and in Marvin JS.
Currently, JChem search supports homology groups on the query and target side, but not on both sides at the same time. Various restrictive properties can also be specified for homology groups.
Homology groups are represented by Pseudo atoms, labeled with common chemical annotations of these groups. Some groups have multiple alias names (abbreviations, alternative spellings). The names are case insensitive, spaces might be inserted.
There are two major types of homology groups regarding the way of their definition:
Built-in homology groups are defined by specific structural properties of the group. These groups are not enumerated during the search, but appropriate substructures are recognized as fulfilling the requirements for such a structure. The possible number of covered structures is usually infinite, unless the number of atoms is limited. Examples of built-in groups are alkyl, aryl, heterocycle, etc.
User-defined homology groups are explicitly defined, and only the listed substructures can match these homology groups. The definition is given in the form of an R-group definition, in which any generic Markush feature can be used. These 'User-defined' definitions can be customized by the user, and they can be context-specific. (E.g. Protecting group definition depends on which functional group it protects.)
Table 1. shows the properties of the built-in homology groups. Each group describes a set of substructures having specific features. These features are shown in the table as "compulsory" parts. Some groups also allow optional parts that might be present in the substructure that matches the homology group.
Table 1. Built-in homology groups
Group name (alias names) | Description | Example | Note |
---|---|---|---|
Alkyl (CHK) | - only carbon and hydrogen atoms - at least one carbon atom - only single bonds - no ring bonds - optional: connection points at arbitrary positions |
![]() |
|
Alkenyl (CHE) | - at least one double bond, no triple bonds - at least two carbon atoms - otherwise same as for Alkyl |
![]() |
|
Alkynyl (CHY) | - at least one triple bond - at least two carbon atoms - optional: double bonds - otherwise same as for Alkyl |
![]() |
|
CarbonChain (AcyclicCarbon, CarbonTree) | - any connected acyclic hydrocarbon (branched or unbranched) - optional: connection points at arbitrary positions |
![]() |
renamed in version 22.18.0 (was CarbonTree) |
HeteroSubstitutedAlkyl (HSA) | - at least one heteroatom - at least one carbon atom - only single bonds - no ring bonds- each heteroatom is connected to a single carbon atom and (optionally) hydrogens - optional: connection points at arbitrary carbon atoms |
![]() |
|
Haloalkyl | - each heteroatom is halogen - otherwise same as for HeteroSubstitutedAlkyl |
![]() |
|
Hydroxyalkyl | - each heteroatom is oxygen - otherwise same as for HeteroSubstitutedAlkyl |
![]() |
|
Cyclyl (AnyCyclyl, AnyRing) | - one or more connected rings without any restrictions - optional: connection points at arbitrary positions |
![]() |
|
Aryl | - one or more connected rings - at least one ring should be aromatic - optional: double or triple bonds in aliphatic rings - optional: arbitrary number of connection points, but all must be on aromatic rings |
![]() |
from version 17.21.0, restriction for external connections removed from version 25.1.0 |
Carbocyclyl | - only carbon and hydrogen atoms - otherwise same as for Cyclyl |
![]() |
from version 20.20.0 |
Carboaryl (ARY) | - only carbon and hydrogen atoms - otherwise same as for Aryl |
![]() |
|
Carboalicyclyl (CYC, Cycloalkyl) | - one or more connected aliphatic rings - only carbon and hydrogen atoms - optional: double or triple bonds - optional: connection points at arbitrary positions |
![]() |
|
Heterocyclyl (Heterocycle) | - at least one heteroatom - at least one carbon atom - otherwise same as for Cyclyl |
![]() |
from version 15.7.6 |
Heteromonocyclyl | - monocyclic ring - otherwise same as for Heterocyclyl |
![]() |
from version 17.21.0 |
Fusedheterocyclyl (HEF, Heteropolycyclyl, FusedHetero) | - fused rings - otherwise same as for Heterocyclyl |
![]() |
renamed in version 22.18.0 (was FusedHetero) |
Heteroaryl | - at least one heteroatom - at least one carbon atom - at least one heteroatom in aromatic ring - otherwise same as for Aryl |
![]() |
from version 17.21.0 |
Heteromonoaryl (HEA) | - monocyclic ring - otherwise same as for Heteroaryl |
![]() |
|
Fusedheteroaryl (Heteropolyaryl) | - fused rings - otherwise same as for Heteroaryl |
![]() |
from version 17.21.0 |
Heteroalicyclyl | - one or more connected aliphatic rings - at least one heteroatom - at least one carbon atom - optional: double or triple bonds - optional: connection points at arbitrary positions |
![]() |
from version 17.21.0 |
Heteromonoalicyclyl (HET) | - monocyclic ring - otherwise same as for Heteroalicyclyl |
![]() |
|
Fusedheteroalicyclyl (Heteropolyalicyclyl) | - fused rings - otherwise same as for Heteroalicyclyl |
![]() |
from version 17.21.0 |
RingSegment | - part of a ring where every atom has only two ring bonds - not a whole ring - optional: non-ring connections (substituents) |
![]() |
|
Halogen (HAL) | - a single halogen atom | F, Cl, Br, I | |
Metal (MX) | - any metal atom | U, K, Fe, Na, Ni, Al, ... | |
AlkaliMetal (AMX) | - alkali or alkaline earth metal atom | Na, K, Ca, Mg, ... | |
TransitionMetal (TRM) | - transition metal atom excluding lanthanum | Fe, Ni, Zn, Co, Hg, W, ... | |
Lanthanide (LAN) | - lanthanide atom (including lanthanum) | Nd, Ce, Pr, ... | |
Actinide (ACT) | - actinide atom (including actinium) | U, Th, Pa, ... | |
OtherMetal (A35) | - group IIIa-Va metal atom | Al, Ga, ... | |
AnyAtom | - a single atom except for hydrogen | C, N, O, P, S, ... | |
AnyGroup (XX, Any) UnknownGroup (UNK, Unknown) |
- any structure (excluding a single hydrogen atom) | AnyGroup and UnknownGroup are equivalent since version 23.16.0 |
Besides the built-in homology groups, users can also define custom groups. User-defined homology groups are represented by R-group definitions, and during search, the pseudo atoms of user-defined homology groups are translated to the corresponding R-group definitions.
These group definitions are customizable, the user can modify them or can make new definitions as well. Group names are treated as case insensitive, but in case sensitive file systems the definition files should be lowercase.
There is a special, predefined (user-defined) homology group that is readily available. It is called Protecting or PRT.
Protecting groups' definition file contains several definitions, each for protecting different functional groups. The protected functional group is defined by the neighborhood of the R-atom. When the R-atom has the same neighborhood as the "protecting" pseudo atom, then the group is replaced by the R-atom.
The conversion processes the group definitions in their order in the file. This means that more specific environments should be placed earlier. For example, a carboxyl protecting group definition should precede an alcohol definition, otherwise the alcohol definitions will be applied instead. Currently, they are located in the following order:
The system cannot handle protecting groups having more than one attachment point, or groups where the heavy atoms of the functional group should be changed by the substitution. The readily available definitions contain amine, carboxyl and hydroxyl protecting groups.
Some examples with different functional groups protected can be found in Table 2.
Table 2. Protecting group examples
Protecting group | Represented examples | ||
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
To enable the enumeration of homology groups, the "Homology Enumeration" option of Markush enumeration has to be switched on. Otherwise, the homology groups are kept as pseudo atoms, which might be useful for showing that these structures cannot be fully enumerated.
For the built-in homology groups, a small set of example structures are used in the case of enumeration. These examples are characteristic to the homology group and encompass simple and large structures as well. They are provided as an R-group definition, similarly to the definition of user-defined homology groups.
We have to emphasize that these example structures are used only for enumeration and do not affect searching. As noted earlier, arbitrary structures fulfilling the requirements for the homology group will match such a target.
Enumeration definitions contain two attachment points by default. They identify the atoms to be connected to the first two neighbors of the group. If the homology group's pseudo atom has more than two connections, then further attachment points are added to the enumerated definitions. These additional attachment points are put on atoms that have free valence (in the order of atom numbering). If a definition does not have the sufficient number of appropriate atoms, then it is rejected (excluded from enumeration). When every definition of the homology group is rejected, an exception is thrown showing that the given homology group does not have any valid enumeration definition.
Enumeration of user-defined homology groups uses the same customizable R-group definitions as searching. User-defined homology groups should have the same number of connections as in the definitions.
Specific properties can also be assigned to homology groups to restrict the set of structures they represent. You might want to specify the size of an alkyl chain or if it is branched. The homology groups have a special property editing dialog where you can set the relevant properties. They include the followings (with the group to which it may be applied):
Deuterium and tritium count: for most homology groups. The value should be given as e.g. D0-4T3, meaning the group contains up to 4 deuterium atoms and exactly 3 tritium atoms.
Text notes: for most homology groups (see details in the next section).
Branching: for all chain groups (BRA for branched, STR for straight chain).
Size: for all chain groups. Chains are marked as low (C1-6. LO), mid (C7-10, MID) or high (C11-, HI) according to the length of the chain.
Saturation: for some cyclic groups. They can be marked as saturated or unsaturated.
Ring type: for some cyclic groups. They can be marked as monocyclic (MON) or multicyclic (FU), or can be marked as 'not specified'.
Not specifying a property means that there is no restriction on that property.
Table 5. Available properties of homology groups.
Category | Homology groups | Size | Branching | D/T count | Ring type | Saturation | Additional Text Notes |
---|---|---|---|---|---|---|---|
Chain groups | Alkyl, Alkenyl, Alkynyl, CarbonChain | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
HeteroSubstitutedAlkyl, Haloalkyl, Hydroxyalkyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Cyclic groups | Aryl, Carboaryl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Carboalicyclyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Heteroaryl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Heteromonoaryl, Fusedheteroaryl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Heteroalicyclyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Heteromonoalicyclyl, Fusedheteroalicyclyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Heteromonocyclyl, Fusedheterocyclyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Cyclyl, Carbocyclyl, Heterocyclyl | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
RingSegment | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Atomic groups | Halogen, Metal, AlkaliMetal, TransitionMetal, Lantanide, Actinide, OtherMetal, AnyAtom | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Special groups | AnyGroup, UnknownGroup | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Text format: letters denoting different parameters followed by number ranges. These entries are separated by commas. Specification of attachment atom type is also possible.
Parameter | Description |
---|---|
Z | Number of single bonds (from version 22.20.0) |
E | Number of double bonds |
Y | Number of triple bonds |
A | Number of aromatic bonds (from version 22.20.0) |
C | Number of carbon atoms |
Heteroatom symbol | Number of heteroatoms of the specified element (e.g., N1-3) |
X | Number of heteroatoms not defined otherwise |
Q | Number of heteroatoms, also including the ones defined otherwise (from version 20.19.0) |
HAL | Number of occurrences of halogen atoms (e.g., HAL1-5) |
NR | Number of rings (according to SSSR) |
RA | Number of ring atoms |
FRC | Number of fused ring connections (from version 24.3.0) |
BRC | Number of bridged ring connections (according to SSSR). A single bridge counts as 1 bridged connection instead of 3. (from version 24.3.0) |
SRC | Number of spiro ring connections (from version 24.3.0) |
>Atomic symbol | One attachment to an atom of the specified element |
>>Atomic symbol | Multiple attachments to atoms of the specified element |
Example: N1-3,NR4,E1-2,>>C
Note: NR, FRC, BRC, SRC properties are available for cyclic homology groups and AnyGroup (UnknownGroup). If applied on AnyGroup, these properties also imply the query property "rb*" in order to avoid the homology group matching only a part of a ring system. If AnyGroup is drawn within a ring and has any of these four properties, then it's strongly suggested to apply "rb*" on the atoms of that ring.
Customized homology definition files should be put into the following directories:
<chemaxon_home>/homology/user_def_groups/
;<chemaxon_home>/homology/enumeration_only/
.You should create these directories if they do not exist. The default location of chemaxon_home directory of the user on different platforms:
%USERPROFILE%\chemaxon\
(typically: c:\Users\<username>\chemaxon\
);~/.chemaxon/
.In order to define a new homology group, you should add its definition as an R-group in an MRV file using a new name that does conflict with existing homology names and aliases. These user-defined homology groups are represented by the given definitions during search and enumeration as well.
<chemaxon_home>/homology/user_def_groups/
.For example, a group called "nucleobase" could be defined like this:
The MRV file can contain multiple structures that describe context-sensitive definitions. You can check the original definition file of the Protecting homology group as an example
(you can find it in the directory chemaxon/enumeration/homology/user_def_groups/
within com.chemaxon-enumeration.jar
).
If you would like to have different definitions for searching and enumeration of a user-defined group, then a separate file should be specified with the same file name in the enumeration_only
directory as well (within <chemaxon_home>/homology/
). In this case, the definitions from the user_def_groups
directory will be used during searching, and the definitions from the enumeration_only
directory will be used for enumeration.
In order to customize the enumeration of existing homology groups, you should copy the corresponding .mrv.gz
definition file from the directory chemaxon/enumeration/homology/enumeration_only/
within com.chemaxon-enumeration.jar
into the directory <chemaxon_home>/homology/enumeration_only/
. Then you can modify this file as you prefer, or you can provide a new definition the same way as described above for the creation of new user-defined homology groups. In either case, the MRV file should be compressed using gzip and the file name must be the same as the original file name within the JAR file. The MRV file can contain multiple structures that describe context-sensitive definitions as described above.
You can also override the definition of the predefined group Protecting (PRT), but in this case the file protecting.mrv.gz
should be put into the directory user_def_groups
instead of enumeration_only
within <chemaxon_home>/homology/
.