Integrating your own format

    The API of Chemaxon IO allows you to implement your own format and to add it to the existing framework. The following steps are needed:

    • Creating file format metadata for your own format.

    • Creating exporter for your own format.

    • Creating importer for your own format.

    • Creating record reader for your own format.

    • Creating format recognizer for your own format.

    A complete example code is available to demonstrate this:

    Creating file format metadata for your own format

    Creating the metadata includes specifying file extension, importer, exporter, record reader, format recognizer and some format features like atomic coordinates, multiple records storage possibility.

    MYFORMAT = MFileFormat.createUserDefined(
            "My Format",                      // format description
            List.of("myformat"),              // format name
            List.of("myformat", "myf"),       // file extensions
            MyFormatRecordReader::new,        // record reader factory
            MyFormatImport::new,              // import module factory
            enc -> new myio.MyFormatExport(), // export module factory
            MyFormatRecognizer::new,          // recognizer factory
            Map.of("myformat", 10),           // priority specification
            MFileFormat.F_IMPORT | MFileFormat.F_EXPORT | MFileFormat.F_RECOGNIZER
                        | MFileFormat.F_MOLECULE | MFileFormat.F_COORDS
                        | MFileFormat.F_MULTIPLE_RECORDS_LEGAL
                        | MFileFormat.F_MULTIPLE_RECORDS_POSSIBLE
    );

    For a complete source code, please see Init.java.

    After this step, the format is registered by:

    MFileFormatUtil.registerFormat(Init.MYFORMAT);

    Creating exporter for your own format

    In order to create your own format exporter, you need to implement the convert() method of chemaxon.marvin.io.MolExportModule, which is the abstract base class of molecule export modules. The convert(Molecule) might return String, byte array or image object. Optionally, you can implement the open(String) and close() methods as well. The open(String fmtopts) method opens the exporter stream and should call super.open(fmtopts) at the beginning. In case of some multi-molecule formats such as RDfile, files begin with a header. This header must be returned by open, either as a String object or a byte[] array. The close() method is called after the last convert to close the stream if needed.

    For a complete source code, please see MyFormatExport.java.

    Creating importer for your own format

    The chemaxon.marvin.io.MolImportModule is the base class of Molecule import modules. The two basic abstract methods are the createMol(), which creates a new target molecule object, and the readMol(Molecule mol), which reads a molecule from the file.

    For a complete source code, please see MyFormatImport.java.

    Creating record reader for your own format

    Record is a string or byte representation of a single molecule with properties in a multi-molecule file. Record reading is faster than reading into molecule objects and makes property pre-reading possible. The base interface for record readers is chemaxon.marvin.io.MRecordReader. The utility class chemaxon.marvin.io.MPropHandler helps with the property pre-reading.

    For a complete source code, please see MyFormatRecordReader.java.

    Creating format recognizer for your own format

    Sometimes it is useful to implement format recognition for your own format to detect the input format from the contents of a file or string. This way, you can import structures without explicitly specifying the input format. If you would like to use format recognition, you have to extend the chemaxon.formats.recognizer.Recognizer abstract class. This functionality is just optional.

    For a complete source code, please see MyFormatRecognizer.java.

    Behavior of IO classes in the existing framework:

    • chemaxon.formats.recognizer.Recognizer: File format recognizer base class. Tries to check the possible formats checking the input line by line.

    • chemaxon.marvin.io.MolExportModule: Abstract base class of molecule export modules.

    • chemaxon.marvin.io.MolImportModule: Abstract base class of molecule import modules.

    • chemaxon.marvin.io.MRecord: Representations of a record where a record is a string or byte representation of a single molecule with properties in a multi-molecule file.

    • chemaxon.marvin.io.MRecordReader: Interface for record reading, AbstractMRecordReader is a basic implementation which is further extended by the specific import types.

    • chemaxon.marvin.io.formats.AbstractMRecordReader: Abstract record reader class. It is able to read lines and to create line number mappings for the records.

    • chemaxon.formats.MolConverter: Converts between molecule file formats, allows simple conversion, splitting and merging structures.

    • chemaxon.formats.MolImporter: Molecule file importer. The input file format is guessed automatically or specified as an import option to the constructor. For more information on supported formats, please visit File Formats.

    • chemaxon.formats.MolExporter: Molecule exporter class. The output file format can be specified as an argument to the constructor of this class. For more information on supported formats, please visit File Formats.

    • chemaxon.formats.MFileFormat: Collection of file format descriptors.

    • chemaxon.formats.MFileFormatUtil: File format related utility functions like creating specific format export modules, registering and handling new file formats.

    • chemaxon.formats.MolInputStream: Molecule input stream that has the ability to determine the input file format.

    • chemaxon.marvin.io.MDocSource: Abstract molecule and document reader/importer class.

    • chemaxon.marvin.io.MRecordImporter: Record and structure reader used by MolImporter internally.

    • chemaxon.marvin.io.PositionedInputStream: Positioned input stream that has the ability to set / get the stream position as well as put back some parts already read.

    • chemaxon.formats.MolFormatException: Exception that is thrown in case of molecule file format detection failure and other problems which occured during import.

    • chemaxon.marvin.io.MolExportException: Molecule export exception for export errors.

    • chemaxon.marvin.io.MRecordParseException: Exception for record reading errors.