There are multiple query sequence formats in the Biomolecule Toolkit to filter both natural and unnatural sequences.
Natural analogue sequences are entered by 1-letter codes, unnatural sequences are entered by multi-letter codes using dot separators between the building blocks.
Example for natural sequence query: SEQUENCE
Example for unnatural sequence query: S.dE.Q.U.dE.N.C.dE
Note: the searches in Biomolecule Toolkit are case-insensitive (e.g. dS is equal to ds)
Available wildcards
‘N’ for nucleic acid sequences
‘X’ for amino acid sequences
‘*’ for 1, 2… n residues in nucleic acid or amino acid sequences
Note: oligonucleotide backbone modifications cannot be queried currently.
Table 1. Single residue wildcards (amino acid sequences)
Query sequence | Target sequence | |||
---|---|---|---|---|
HCAYKAMGNMAMCAQRTPY (wild type) | HCAYAAMGNMAMCAQRTPY (mutation) | HCAYKAMGNMAMCAQRTPYK (insertion) | HCAYKAMGNMAMCAQRTPY (deletion) | |
HCAYXAMGNMAMCAQRTPY |
Table 2. Multi-residue wildcards (amino acid sequences)
Query sequence | Target sequence | ||||
---|---|---|---|---|---|
HCAYKAMGNMAMCAQRTPY (wild type) | HCA K KAMGNMAMCAQRTPY (mutation) | HCAYKAMGN K MAMCAQRTPY (insertion) | HCAYKAMGNMA MCAQRTPY (deletion) | HCAYK AYK AMGNMAMCAQRTPY (duplication) | |
HCAY*QRTPY | |||||
HCAYMGNMQRTPY | |||||
HCAYMAQRTPY | |||||
HCAXKAM*QRTPY |