Input and Output System - Import¶
Molecule import is the operation when sources of data, i.e. structures defined in various formats are converted to Molecule objects so that Chemaxon applications can operate with them.
Sources of data¶
When importing structures with Chemaxon tools, we will refer to different sources the data comes from:
- Structure file where location is given with absolute or relative path
- Structure file where location is given with URL
- Molecule source text (e.g. pre-read content of a structure file) in various formats
Basic import using the API¶
The most frequently used API for molecule import is defined in chemaxon.formats.MolImporter class. MolImporter has lots of utility functions.
Importing from String¶
The simplest way of importing one molecule where the molecule source is available as String is using the static method of MolImporter class.
Working example of reading a molecule from string
For a complete source code example with a simple GUI, please see ImportMoleculeSource.java.
Importing from InputStream¶
Importing one molecule where the molecule source is available via an InputStream:
For a complete source code example with a simple GUI, please see ImportFromStream.java.
Please note that the MolImporter needs to be closed explicitly!
Importing options¶
Additional options of MolImporter allow to refine behavior further. Options can be general or dependent on file formats. Options can be set in the constructor MolImporter(InputStream is, String opts) or during import with static method MolImporter.importMol(String s, String opts). The most important option is the file format option which specifies the format to read from. However, without this format option the automatic format recognition will detect the format. If the import speed is an important factor then the format option is strongly recommended. General or file format dependent options are separated by a colon from the file format option and by a comma from each other. In the code example below the molecules from the imported multi-structure file are merged into one structure with the MULTISET option. This is a general option and can be applied in case of any file format. File format option is sdf for MDL MOL SD file format and file format specific option is Usg to ungroup any S-group found in the structure file.
For a complete source code, please see ImportExportOptions.java.
Note that after importing SMILES, invoking of MoleculeGraph.clearCashedInfo method is recommended in order to remove cashed information which results increased molecule size.
Importing a multi-molecule file¶
Importing molecules from a multi-molecule file given with URL:
Working code example of importing molecules from file
For a complete source code example with a simple GUI, please see ImportMultiMoleculeFile.java.
Iterating on molecules where the molecule is the target of the "foreach" statement:
Please note that only one Iterator per MolImporter is working at the moment. For a complete source code, please see ImportIterator.java.
Accessing a molecule directly¶
A molecule in a multi-molecule input can be accessed directly, seeking to the second molecule in this code example:
For a complete source code, please see SeekingMolecule.java, seekRecord method.
Importing with MRecordImporter¶
Iterating on molecules and documents is possible with the MRecordImporter class also:
For a complete source code, please see ImportMultiMoleculeFile.java, importMoleculeWithMRecordImporter method.
Importing with MRecordReader¶
Iterating on records only is possible with the MRecordReader class :
For a complete source code, please see ImportRecords.java.