Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

Molconverter is a command line program in Marvin Beans and JChem that converts between various file types.

Usage

molconvert [options] outformat[:exportoptions] [files...]

The outformat argument must be the codename of one of the supported formats. Some example:

Format typeCodename of the format
Document formats

mrv, cdx, cdxml, skc

Molecule file formats

mol, rgf, sdf, rdf, csmol, csrgf, cssdf, csrdf , csv,

cml , smarts, cxsmarts, smiles, cxsmiles, abbrevgroup, peptide

sybylmol2, pdb, xyz, inchi, inchikey, name

Graphics formats

jpeg, msbmp, png, pov, svg, emf, tiff, eps

Compression and Encoding

gzip, base64

Alternatively, use  

molconvert [options] query-encoding [files...]

to query the automatically detected encodings of the specified molecule files.

From files having doc, docx, ppt, pptx, xls, xls, odt, pdf, xml, html or txt format, Molconvert is able to recognize the name of compounds and convert it to any of the above mentioned output formats.

Options

Export options can be specified in the format string. The format descriptor and the options are separated by a colon, the options by commas.


-o fileWrite output to specified file instead of standard output
-mProduce multiple output files
-e charsetSet the input character encoding. The encoding must be supported by Java.
-e [in ]..[ out]Set the input (in) and/or output (out) character encodings. Examples: UTF-8, ASCII, Cp1250 (Windows Eastern European), Cp1252 (Windows Latin 1), ms932 (Windows Japanese).
-s stringRead molecule from specified SMILES, SMARTS or peptide string (try to recognize its format)
-s string { format : options }Read molecule from the string in the specified format (can be omitted), using the specified importoptions (can be omitted)
-f <string>Specify the import format and options
--smiles stringRead molecule from specified SMILES string
--smarts stringRead molecule from specified SMARTS string
--peptide stringRead molecule from specified peptide string
-gContinue with next molecule on error (default: exit on error)
-YRemove explicit H atoms
-I <range>process input molecules with molecule index (1-based) falling into the specified range (e.g. 5-8,15 refers to molecules 5,6,7,8,15)
-Ufuse input molecules and output the union
-R <file>[:<range>]fuse fragments to input molecule(s) from file with specified mol index range range syntax: "-5,10-20,25,26,38-" (e.g. -R frags.mrv:20-)
-R<i> <file>[:<range>]fuse R<i> definition members to input molecule(s) from file in specified index range (e.g. -R1 rdef1.mrv:5-8,19)
-R<i>:<1|2> <file>[:<range>]fuse R<i> definition members to input molecule(s) from file in specified index range, filter molecules having 1 (2, resp.) attachment points (e.g. -R1:2 rdef1.mrv:-3,8-10)
-FRemove small fragments, keep the largest
-c"f1 OP value&f2 OP value..."Filtering by the values of fields in the case of SDF import.
OP may be: =,<,>,<=,>=
--mol-fields-to-recordsConvert molecule type fields to separate records.
-vVerbose
-vvVery verbose (print stack trace at error)
-2 [ : options] [ : F<i1><i2>...,<iN>]Calculate 2D coordinates Options for coordinate calculation. 
Performs partial clean with fixed atom coordinates for atoms <i1><i2>...,<iN> (1-based indexes) if the Fparameter is specified.
-3 [ : options]Calculate 3D coordinates
Options for coordinate calculation.
-H3D Help on options for 3D calculations. Detailed list on Clean 3d Options

 

Import options can be specified between braces, in one of the following forms:

filename{options} 
filename{MULTISET,options}to merge molecules into one that contains multiple atom sets
filename{format:}to skip automatic format recognition
filename{format:options} 
filename{format:MULTISET,options} 

You can also pass options to JAVA VM when you run the application from command line.

 

Options for file formats:

 mrv

a, +a, +a_gen
General aromatization. Example: "XXX:a"
a_bas
Basic aromatization. Example: "XXX:a_bas"
a_loose 
Loose aromatization. Example: "XXX:a_loose"
a_ambig
Ambiguous aromatization. Example: "XXX:a_ambig"
-a, -a_gen
General Dearomatization. Example: "XXX:-a"
-a_huckel
Huckel dearomatization. Example: "XXX:-a_huckel"
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure. Example: "XXX:-a_huckel_ex"
H, +H
Add explicit Hydrogen atoms. Example: "XXX:H"
-H
Remove explicit Hydrogen atoms. Example: "XXX:-H"

Here, XXX can be any molecule or image format like mrv, mol, smiles, cxsmiles, abbrevgroup, cml, jpeg, png or svg, but aromatization options have no effect on formats which do not store bond orders like cube, pdb and xyz.

 cdx, cdxml

a, +a, +a_gen
General aromatization. Example: "XXX:a"
a_bas
Basic aromatization. Example: "XXX:a_bas"
a_loose 
Loose aromatization. Example: "XXX:a_loose"
a_ambig
Ambiguous aromatization. Example: "XXX:a_ambig"
-a, -a_gen
General Dearomatization. Example: "XXX:-a"
-a_huckel
Huckel dearomatization. Example: "XXX:-a_huckel"
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure. Example: "XXX:-a_huckel_ex"
H, +H
Add explicit Hydrogen atoms. Example: "XXX:H"
-H
Remove explicit Hydrogen atoms. Example: "XXX:-H"

Here, XXX can be any molecule or image format like mrv, mol, smiles, cxsmiles, abbrevgroup, cml, jpeg, png or svg, but aromatization options have no effect on formats which do not store bond orders like cube, pdb and xyz.

 skc

a, +a, +a_gen
General aromatization. Example: "XXX:a"
a_bas
Basic aromatization. Example: "XXX:a_bas"
a_loose 
Loose aromatization. Example: "XXX:a_loose"
a_ambig
Ambiguous aromatization. Example: "XXX:a_ambig"
-a, -a_gen
General Dearomatization. Example: "XXX:-a"
-a_huckel
Huckel dearomatization. Example: "XXX:-a_huckel"
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure. Example: "XXX:-a_huckel_ex"
H, +H
Add explicit Hydrogen atoms. Example: "XXX:H"
-H
Remove explicit Hydrogen atoms. Example: "XXX:-H"

Here, XXX can be any molecule or image format like mrv, mol, smiles, cxsmiles, abbrevgroup, cml, jpeg, png or svg, but aromatization options have no effect on formats which do not store bond orders like cube, pdb and xyz.

 cml

Export options

The argument of MolConverter, MolExporter and the getMol/getM functions (of the applets and beans) is the format string. The format specification ("cml") is followed by ":" and the selected option(s) for CML export.

CodeNameExplanation
a, +a, +a_gen
General aromatization.
a_bas
Basic aromatization.
a_loose 
Loose aromatization.
a_ambig
Ambiguous aromatization.
-a, -a_gen
General Dearomatization.
-a_huckel
Huckel dearomatization.
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure.
H, +H
Add explicit Hydrogen atoms.
-H
Remove explicit Hydrogen atoms.
A      

Atom attributes are stored in arrays. For 2D molecules, only the x, y coordinates are stored. This is a more compact form of storage than the default (using <atom> tags).

P

Create human readable output: put new XML elements in new lines and indent for embedded elements.

CN

The accuracy of the exported coordinates can be given: N is the length of the decimals of the coordinate, 0 < N ≤ 9

D

This option is important if the molecule has parity information and has 0 dimension. By default during the export, a clean method is invoked on the structure and the generated coordinates and wedge information are exported into CML format but NOT the parity information. However, using this option coordinates and wedge information are not generated but parity information is exported.
Attention: When a CML file containing parity information is imported to Marvin older than 5.8, the parity information will be displayed wrong!

I

Ignore unexportable molecule properties. Without this option the exporter will throw an exception when reach an unexportable property.

BOM

Write the UTF-8 byte order mark (BOM), if the given or the system's encoding is UTF-8.

For example: cml:A or cml:C5.

 peptide

Import options

--peptide <string> 

The string is a valid one or three letter sequence.

convert a one-letter sequence to a molfile:

molconvert --peptide FFKMLL mol -o peptide.mol

Export options

peptide:3  

 three-letter sequence

  • convert SMILES representation to a three-letter sequence

    molconvert peptide:3 -s "C[C@H](N)C(O)=O"
  • convert one-letter sequence to a three-letter sequence

    molconvert --peptide GAG peptide:3
peptide:1

one-letter sequence


convert the SMILES string to a one-letter sequence

molconvert peptide:1 -s "C[C@H](N)C(O)=O"



 inchi

Export options

CodenameExplanation
H, +H
Add explicit Hydrogen atoms.
-H
Remove explicit Hydrogen atoms.
Srel
Force relative stereo.
SAbs
Force absolute stereo
NEWPS
Narrow end of wedge points to stereocenter (default: both)
RecMet
Include reconnected metals results
FixedH
Mobile H Perception Off (Default: On)
AuxNone
Omit auxiliary information (default: Include)
NoADP
Disable Aggressive Deprotonation (for testing only)
Compress
Compressed output
DoNotAddH
Don't add H according to usual valences: all H are explicit
Key
Exports the InChIKey as well
Woff
Do not display warnings
 inchikey

Export options

CodenameExplanation
H, +H
Add explicit Hydrogen atoms.
-H
Remove explicit Hydrogen atoms.
Srel
Force relative stereo.
SAbs
Force absolute stereo
NEWPS
Narrow end of wedge points to stereocenter (default: both)
RecMet
Include reconnected metals results
FixedH
Mobile H Perception Off (Default: On)
AuxNone
Omit auxiliary information (default: Include)
NoADP
Disable Aggressive Deprotonation (for testing only)
Compress
Compressed output
DoNotAddH
Don't add H according to usual valences: all H are explicit
Key
Exports the InChIKey as well
Woff
Do not display warnings
 name

Import options


CodenameExplanation
ocr

converts names containing OCR (optical character recognition) error.
Example: convert the defective name "3-rnethyl-l-methoxynaphthalene" to SMILES

molconvert 'smiles:T*' -s '3-rnethyl-l-methoxynaphthalene' -f name:ocr
-systematic

disable conversion of systematic names

-commondisable conversion of common names (such as aspirin)
-elementsdisable conversion of the name of chemical elements, for instance carbon, sodium, .... Even though "carbon" is not converted, "methane" still is, since it is a molecule name for CH4, not an element.
-ionsdisable conversion of atomic ion syntax, for instance "Ca2+".
-groupsdisable conversion of groups and fragments, such as "oxo" or "methyl".
-casdisable the conversion of CAS registry numbers
-casNamesdisable the conversion of CAS names
nameField=FIELDsets the field/property that stores the original name. By default, the molecule title is used.
dict=PATH specify the location of the custom dictionary. Example: name:dict=C:\Users\Me\MyDictionary.smi .
webservice=URL enable the usage of a custom webservice at the given URL

Some of these options are mainly useful when configuring which names Document to Structure recognizes.

To enable an option, a + sign can be used before the option name. For instance, both forms ocr and +ocr are accepted to enable this option.

 Graphic formats


a, +a, +a_gen
General aromatization
XXX:a
a_loose 
Loose aromatization
XXX:a_loose
a_ambig
Ambiguous aromatization
XXX:a_ambig
-a, -a_gen
General Dearomatization
XXX:-a
-a_huckel
Huckel dearomatization
XXX:-a_huckel
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure
XXX:-a_huckel_ex
H, +H
Add explicit Hydrogen atoms
XXX:H
-H
Remove explicit Hydrogen atoms
XXX:-H
+numbering
assigns atom numberings corresponding to the IUPAC name
XXX:+numbering
H_offDo not show implicit Hydrogen labels.
XXX:H_off
H_heteroImplicit Hydrogen labels on heteroatoms only.
XXX:H_hetero
H_heterotermImplicit Hydrogen labels on hetero- and terminal atoms (default).
XXX:H_heteroterm
H_allImplicit Hydrogen labels on all atoms.
XXX:H_all
chiral_offSwitch off chirality support, do not show R/S labels (default).
XXX:chiral_off
chiral_selectedShow R/S if the chiral flag is set for the molecule.
XXX:chiral_selected
chiral_allShow R/S for any molecule.
XXX:chiral_all
MP_LABEL_VISIBLE
Show M/P for any molecule.
XXX:mp
noRGroupsDo not show R-groups.
XXX:noRgroups
noRLogicDo not show R-logic.
XXX:noRLogic
w...
h...
Image width and height in pixels. If only one from w and h is specified, then the other will have the same value. If none of them is specified, then their values are calculated from scale. If scale is not specified, then the default size is 200x200.XXX:w200,h200
scale...Magnification. 1.54Å (C-C bond length) is scale pixels.
maxscale...Maximizes the magnification to prevent overscaling of small molecules.
It is usually set to 28, which is the scale factor for 100% magnification.

atsiz...

Atom label font size in C-C bond length units. Default: 0.4

Note: atsiz*1.54 Å = atsiz*scale points


atomFont...

Atom label font type and size in pt.
 

atomFont:SansSerif-ITALIC-10
atomFont:Times New Roman-PLAIN-10
bondl...

Bond length in pt. Default: 28

bondl42.0
bondw...

Bond spacing in C-C bond length units. Default: 0.18

Note: bondw*1.54 Å = bondw*scale pixels


boldbondw...Width of bold bond in pt. Default: 6
bondHashSpacing...
The spacing of the hash in hashed bonds in C-C bond length units.
wireThickness...Bond thickness in wireframe mode. Default: 0.064
stickThickness...The stick diameter for ball and stick mode. Default: 0.1
ballRadius...Ball radius for ball and stick mode. Default: 0.5
#rrggbbBackground color. It also determines the brightness of the CPK palette (for atoms and bonds); lighter colors are choosen automatically for dark background and conversely. Default: "#ffffff"
#aarrggbbBackground color with alpha value. Use alpha=0 for transparent background, e.g. "#00ffffff". Note that the alpha channel is not supported by all image formats. Default: "#ffffffff"
transbgSets the image background to transparent.
monoBlack & white.
cpkUse CPK colors (default).
shapelyUse the shapely color scheme.
groupUse coloring based on residue sequence numbers.
setcolors:...Use atom/bond set colors. Colors can be specified as a colon separated list of values. Use "ak:#rrggbb" for atom set k, "bk:#rrggbb" for bond set k. The hashmark "#" can be omitted. Human-readable color names like "red", "green", "blue" can also be used.
wireframeWireframe rendering style (default for 2D).
wireknobsWireframe with knobs - used til version 17.9. Later versions fall back to wireframe
ballstick"Ball & stick" rendering style (default for 3D).
spacefillSpacefill rendering style.
noantialiasSwitch off antialiasing.
amapDisplays atom mapping.
anumDisplays atom numbers.
atomNumberingType...Sets the type of atom numbering. Implies anum parameter.
Possible values:
  • 1 (Atom numbers)
  • 2 (IUPAC numbering)

lpDisplays lone pairs.
lpexplDisplay the explicit lone pairs instead of the implicit lone pairs if lone pair displaying is switched on. See the lp parameter.
lonePairsAsLineDisplay lone pairs as a line instead of the default two dots. This parameter has effect only if the lp parameter is also specified.
downwedge_mdlDown wedge orientation points downward (MDL). (default)
downwedge_daylightDown wedge orientation points upward (Daylight).
anybond_autoDraw any bonds with dashed lines in most cases. If all bonds are generated from atom coordinates, any bonds are displayed with solid lines. (default)
anybond_dashedDraw any bonds with dashed lines.
anybond_solidDraw any bond with solid lines.
noatsymHide atom symbols in 3D mode.
valpropShow valence property on atoms that have the valence property explicitly set.
ezShow E/Z labels.
cv_onAlways show the atom labels of carbon atoms.
cv_offNever show the atom labels of carbon atoms.
cv_inChainShow the atom labels of carbon atoms at straight angles and at implicit Hydrogens.
bondLengthVisibleDisplay the length of bonds in Angstroms.
valenceErrorVisibleDisplay valence errors.
absLabelVisibleSet the Absolute label visibility to true.
ligandOrderVisibility_withDefActive by default. Show ligand order on images only when the R-group definition is present.
ligandOrderVisibility_onShow all ligand order on images for R-groups.
ligandOrderVisibility_offNever show ligand order on images for R-groups.
apropShow explicitly set properties on atoms.
liganderrShow ligand errors on R-groups.
coordBondStyle_solidDisplay coordinate bond as a single bond.
coordBondStyle_arrowDisplay coordinate bond as an arrow.
coordBondStyleAtMulticenter_hashedDisplay coordinate bond as a dashed bond when it connects to a multicenter atom.
coordBondStyleAtMulticenter_solidDisplay coordinate bond as a single bond when it connects to a multicenter atom.
chargeWithCircleDisplay charge symbols in a circle.
oneLetterPeptideDisplayDisplay peptides with their one letter abbreviation instead of the three letter abbreviation which is the default.
disableAminoAcidBondColoring
Disable the amino acid bond coloring.
fogFactor...Set the fog factor scale value (integer). Default value: 0, range: 0..100.
marginSize...Set the margin width in pt. Default: 10

 

2D defaults: H_heteroterm,w200,h200,#ffffffff,cpk,wireframe  

3D defaults: H_heteroterm,w200,h200,#ff000000,cpk,ballstick

Examples:

jpegDefault settings: 200x200 pixels, white background (or black in 3D).
jpeg:w100,#ffff00 100x100 JPEG with yellow background.
jpeg:w100,h150 100x150 JPEG with default background.
png:aprop -s "C1-C10 alkyl" -o alkyl.pngPNG showing "C1-C10 alkyl".



a, +a, +a_gen
General aromatization. Example: "XXX:a"
a_bas
Basic aromatization. Example: "XXX:a_bas"
a_loose 
Loose aromatization. Example: "XXX:a_loose"
a_ambig
Ambiguous aromatization. Example: "XXX:a_ambig"
-a, -a_gen
General Dearomatization. Example: "XXX:-a"
-a_huckel
Huckel dearomatization. Example: "XXX:-a_huckel"
-a_huckel_ex
Huckel dearomatization, throwing exception in case of failure. Example: "XXX:-a_huckel_ex"
H, +H
Add explicit Hydrogen atoms. Example: "XXX:H"
-H
Remove explicit Hydrogen atoms. Example: "XXX:-H"

Here, XXX can be any molecule or image format like mrv, mol, smiles, cxsmiles, abbrevgroup, cml, jpeg, png or svg, but aromatization options have no effect on formats which do not store bond orders like cube, pdb and xyz.

mol:V2for exporting position variation bond to MDL mol v2000

 

Examples

  1. Printing the SMILES string of a molecule in a molfile

    molconvert smiles caffeine.mol
  2. Dearomatizing an aromatic molecule:

    molconvert smiles:-a -s "c1ccccc1"
  3. Aromatizing a molecule:

    molconvert smiles:a -s "C1=CC=CC=C1"

    (The default general aromatization is used.)

  4. Aromatizing a molecule using the basic algorithm:

    molconvert smiles:a_bas -s "CN1C=NC2=C1C(=O)N(C)C(=O)N2C"
  5. Converting a SMILES file to MDL Molfile

    molconvert mol caffeine.smiles -o caffeine.mol
  6. Making an SDF from molfiles:

    molconvert sdf *.mol -o molecules.sdf


  7. Printing the encodings of SDfiles in the working directory:

    molconvert query-encoding *.sdf
  8. SMILES to Molfile with optimized 2D coordinate calculation, converting double bonds with unspecified cis/trans to "either"

    molconvert -2:2e mol caffeine.smiles -o caffeine.mol
  9. 2D coordinate calculation with optimization and fixed atom coordinates for atoms 1, 5, 6:

    molconvert -2:2:F1,5,6 mol caffeine.mol
  10. Import a file as XYZ, do not try to recognize the file format:

    molconvert smiles "foo.xyz{xyz:}"

    Note: This is just an example. XYZ and other formats known by Marvin are always recognized (send us a bug report otherwise), so the specification of the input format is usually not needed. It is only relevant if a user-defined import module is used.

  11. Import a file as XYZ, with bond-length cut-off = 1.4, and max. number of Carbon connections = 4, export to SMILES:

    molconvert smiles "foo.xyz{f1.4C4}"
  12. Import a file as Gzipped XYZ, with the same import options as in the previous example:

    molconvert smiles "foo.xyz.gz{gzip:xyz:f1.4C4}"
  13. Like the previous example but merge the molecules into one molecule that contains multiple atom sets. MDL molfile is exported.

    molconvert mol "foo.xyz.gz{gzip:xyz:MULTISET,f1.4C4}"
  14. Import an SDF and export a table containing selected molecules with columns: SMILES, ID, and logP:

    molconvert smiles -c "ID<=1000&logP>=-2&logP<=4" -T ID:logP foo.sdf
  15. Fuse R2 definition from file, filter fragments with 1 attachment point:

    molconvert mrv in.mrv -R2:1 rdef.mrv
  16. Fuse fragments from file (note, that the input molecule, which the fragments are fused to, should also be specified):

    molconvert mrv in.mrv -R frags.mrv
  17. Generate all common names for a structure:

    molconvert "name:common,all" -s tylenol
  18. Generate the most popular common name for a structure (It fails if none is known.):

    molconvert name:common -s viagra
  19. Generate SMILES from those molecules that names are mentioned in a file foo.html:

    molconvert smiles foo.html