Document to Structure User Guide

    Document to Structure processes PDF, HTML, XML, text files and office file formats: DOC, DOCX, PPT, PPTX, XLS, XLSX, ODT. It recognizes and converts the chemical names (IUPAC, CAS, common and drug names), SMILES and InChI found in the document into chemical structures.

    D2s conversion uses the name-to-structure converter. For the supported names and current limitation, see the Name to Structure Home documentation. You can extend the document to structure conversion by creating a custom dictionary file.

    D2s can be used via API, command line application (MolConverter), or MarvinView. Text mining can also be automatized by using d2s integrated into Knime or into Pipeline Pilot.

    OCR and syntax correction

    Chemaxon's d2s toolkit is able to correct several simple OCR and syntax error. For instance, given the incorrect name "3-rnethyl-l-me-thoxynaphthalene", it automatically corrects the name to "3-methyl-1-methoxynaphthalene" and generates the corresponding structure.

    Document to Structure Conversion in MarvinView

    Open a PDF file containing chemical names. MarvinView will display all the structures corresponding to the recognized names. The structures can then be saved, copy-pasted, opened in the MarvinSketch editor.

    Document to structure conversion from command line

    As a commandline tool, you can use MolConverter for d2s conversion. Example:

    • Converting "test.pdf" name file to MOL file:

       molconvert mol test.pdf -o test.mol

    Structure conversion from OLE objects

    D2s converts the chemical structures from OLE objects – created by various chemical sketchers such as Marvin, ChemDraw, ISIS/DRAW, SYMYX DRAW, and Accelrys Draw – embedded in office documents.

    Chemical image recognition

    For structures represented as images in PDF or Office documents, d2s can make use of several Image to Structure tools (also called Optical Structure Recognition or Chemical OCR ). When such a tool is installed and successfully recognizes an image, the chemical structure becomes part of the output of d2s; it can be visualized, edited, indexed and searched just like any other structure.

    Currently, the supported Image to Structure tools are:

    See configuration instructions to know how to make those tools recognized by d2s.

    Note that structures present as vector graphics rather than bitmap are not converted, unless the osraRendered format option is used.

    See also

    License information

    You need the "Document to Structure" license.