Markush DARC format - VMN

VMN import was discontinued since version 14.7.7.0. and reintroduced in version 19.27 as a beta verson. VMN export is introduced for the first time in 19.27 as a beta version.

VMN format

VMN Files are describing Markush structures in a Markush DARC compatible format.

When to use VMN format?

VMN support at ChemAxon is designed to provide a solution for those who want to use ChemAxon products but still in need of an interface for tools that only accept Markush DARC compatible formats.

ChemAxon softwares provide wider set of Markush and variable structure features than the VMN compatible features, so that VMN format is only recommended when there are no other options.

Import from VMN format

VMN format is a binary format and it may have a human-readable AMN companion file. AMN files are automatically processed if they are in the same folder and the file name is the same as the corresponding VMN file (except the extension).

For example if there is a something.vmn file and right in the same folder there is a something.amn file, then something.amn will be automatically processed during the import of something.vmn.

Export to VMN format

Export is similar to import. If the Markush structure contains information that has to be stored in AMN files, then the export process automatically generates AMN file next to the generated VMN file.

Code: vmn

Extension: .vmn

Interpretation of VMN features

  • Groups: G0 is read in as the scaffold while G1, G2, ... are stored in corresponding R-groups R1, R2, ... The representation of attachments is described below.

  • Undefined attachment information is ignored.

  • Moieties on the scaffold are represented as repeating units with repetition ranges with no crossing bonds.

  • Atom attributes: we interpret the following VMN atom attributes:

    VMN attribute name

    ChemAxon terminology

    AM - Abnormal mass

    isotope

    AV - Abnormal valence

    valence

  • Homology atom attributes: we store the following VMN homology atom attributes in Marvin atom properties:

    VMN attribute name

    Marvin atom property name

    property values

    DT - Deuterium-Tritium counts

    DTCOUNT

    D[deuterium count]T[tritium count] (e.g. D3T2)

    CR - Carbon ring attributes

    BRANCHING

    BRA, STR

    SIZE

    LO, MID, HI, LO MID, MID HI, LO HI

    SATURATION

    SAT, UNS

    RINGTYPE

    MON, FU

    data in AMN

    TEXTNOTES

    AMN text referring to the atom (e.g. N0-4,S0-4)

Structure shortcuts (abbreviated groups)

The following structure shortcuts (abbreviated groups) are supported:

C2, C3, ..., C50

ACE

BU

CN

CO1

CO2

COI

ET

IBU

IPR

MBE

NBU

NO2

NPR

OBE

PBE

PH

PO3

PO4

SBU

SO2

SO3

TBU

Amino acids (peptides)

The following standard amino acids (peptide abbreviated groups) are supported:

ALA

ARG

ASN

ASP

CYS

GLN

GLU

GLY

HIS

ILE

LEU

LYS

MET

PHE

PRO

SER

THR

TRY

TYR

VAL

The following non-standard peptides are also supported:

ABU

aminobutyric acid

ASU

aminosuberic acid

GLP

pyroglumatic acid

HCY

homocysteine

HSE

homoserine

NLE

norleucine

NVA

norvaline

ORN

ornithine

SAR

sarcosine

STA

statine

Note, that VMN defined peptide connection bonds are not handled currently, they are interpreted as single covalent bonds.

For more information on peptide representation refer to the Sequences - peptide, DNA, RNA documentation.

Superatoms (homology pseudo atoms)

Superatoms representing homology groups are read in as pseudo atoms. The following homologies are interpreted by enumeration and search:

CHK

CHE

CHY

CYC

ARY

HET

HEA

HEF

UNK

MX

AMX

A35

TRM

LAN

ACT

HAL

ACY

PRT

XX

Multiple R-group attachments

images/download/thumbnails/1806607/vmn_1.png

Markush Compound Number

VMN files contain a segment of 12 bytes in the header to hold Markush Compound Number. It is an alpha-numerical string with the following restrictions:

  • Maximum 12 characters

  • Can contain only capital letters, numbers and dashes

  • Some softwares can have other restrictions

The Markush Compound Number is read as the title of the scaffold molecule and this is the value exported to VMN as well. If this value is not present, the exporter generates a Markush Compound Number.

Format of the automatically generated Markush Compound Number: MMYY-mmmmm where MM is the current month, YY is the current year and mmmmm is the last 5 digits of the current epoch milliseconds.

This value can be manipulated by API. Use the setName method on the scaffold molecule. For further details on how to access the scaffold molecule see the Markush representation documentation, the scaffold molecule is called root on the API.

File Segment

VMN files contain a segment of 32 bytes in the header to hold data called File Segment. It is an alpha-numerical string with the following restrictions:

  • Maximum 32 characters

  • Can contain only capital letters, numbers and dashes

  • Some softwares can have other restrictions

The File Segment value is read as a molecule property of the scaffold molecule and this is the value exported to VMN as well. The property key is "FileSegment".

This value can be manipulated by API. Use the properties method on the scaffold molecule. For further details on how to access the scaffold molecule see the Markush representation documentation, the scaffold molecule is called root on the API.

This value also can be manipulated on the UI, you can find the guide here. To select the proper component, make sure nothing is selected and right click on an empty spot on the canvas and follow the guide from there. An other method is to double click on an atom of the scaffold which should select the entire scaffold. If your scaffold have multiple fragments, repeat this selection for all fragments while holding down Shift.

Limitations

Not supported variable structure features:
  • Position Variation Bonds

  • Link Nodes

  • Alias for R-groups

Not supported substructure-group features:
  • Polymer related groups (Monomer, Polymer, Copolymer, Graft, Crosslink, etc.)

  • Non-exact repetition

  • Mixtures

VMN specification related limitations:
  • Atoms with 104 or higher atomic number are not supported

  • Maximum recommended R-group definition count is 50

Used external references during implementation

  1. Derwent World Patents Index, Markush DARC User Manual, The Thomson Corporation, 1993, 2008