Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Peptide sequence format

Peptides can be entered using one or three letter amino acid abbreviations.

A text file containing sequences should contain only one type of sequence (only one or only three lettered sequences but not both). Each line must have one and only one continuous line in the text file without spaces. Abbreviations used:

 


Panel

Code: peptide


Panel
titleSee also

Peptide import and export options



3-letterAlaArgAsnAspAsxCysGlnGluGlxGlyHisIleLeuLysMetPheProPylSecSerThrTrpTyr
1-letterARNDBCQEZGHILKMFPOUSTWY



Example

Valid files
Code Block
PPPALPPKKR
APTMLPPASDFA


Code Block
ProProProAlaLeuProProLysLysArg
AlaProThrMetProProProLeuProPro


Invalid files
Code Block
PPPALPPKKR
AlaProThrMetProProProLeuProPro


Code Block
ProProProAlaLeuProProLysLysArg
AlaProThrMetPPPLPP



Custom amino acids

In addition to the amino acids listed above, custom amino acids dictionary can be defined.

The custom_aminoacids.dict file is stored in the .chemaxon directory (UNIX) or the user's chemaxon directory using MS Windows.


The usual format of the dictionary file is:

molName=L-Alanine    Ala    A    [CX4H3][C@HX4H1]([NX3])C=O |wD:1.1,(3.85,-1.33,;2.31,-1.33,;1.54,-2.67,;1.54,,;)|    3    4
molName=L-Cysteine    Cys    C    [NX3][C@@HH1]([CH2][SH1])C=O |wD:1.0,(1.54,-2.67,;2.31,-1.33,;3.85,-1.33,;4.62,-2.67,;1.54,,;)|    1    5    4


where the corresponding columns are:

name

 not an obligatory field (introduced in Marvin 6.2)

molName=name

long (three-letters code) abbreviation

A capital letter followed by two small ones 

Ala
short (one-letter code) abbreviationX and some characters will follow this character between parentheses.

Allowed characters are the letters of the alphabet, numbers and the dash character.

molName=Sarcosine    Sar    X(Sar) ....
SMARTS representation of the amino acid fragment without terminal OH

Note the SMARTS strings representing amino acid fragments are denoting the hydrogens and sometimes the connection numbers to avoid ambiguity.

For example if only the C[C@H](N)C=O string is used for L-alanine in the first example, this would match for many other amino acids as well as some of them are "containing" this string as a substructure.

No query bonds allowed.

coordinates of the structure

Molecular coordinates are needed for cleaning. If they are missing, Ctrl+2 creates the coordinates for the structure.

Coordinates can be generated by Molconvert using:

Code Block
languagetext
cxsmarts:c

option

the number of the backbone nitrogen in the SMARTS string3 for Ala in the first example
the number of the C terminal carbon4 for Ala in the first example
the number for other attachment point if neededS for L-cysteine in the second example


The name and the coordinates are not obligatory fields.

The columns should be separated by tab characters.



Info

Note the SMARTS strings representing amino acid fragments are denoting the hydrogens and sometimes the connection numbers to avoid ambiguity.For example if only the C[C@H](N)C=O string is used for L-alanine, this would match for many other amino acids as well as some of them are "containing" this string as a substructure.

No query bonds allowed.

To describe an aromatic custom amino acid both the aromatic and the Kekule form should be in the custom_aminoacids.dict file with the same short and long names.


DNA/RNA sequence format

DNA/RNA sequences can be entered using one letter nucleic acid abbreviations. Each line must have one and only one continuous line in the text file without spaces. Abbreviations used:

DNAACGT
RNAACGU



Panel

Code: dna, rna


Example

Valid files:
Code Block
ACGTACGT
ACCCCGTGGGT


Code Block
A-C-G-T-A-C-G-T
A-C-C-C-C-G-T-G-G-G-T


Code Block
dA-dC-dG-dT-dA-dC-dG-dT
dA-dC-dC-dC-dC-dG-dT-dG-dG-dG-dT
Invalid files
Code Block
acgtacgt