CSV¶

Basic information about the format

Import options
- Headers
  - Automatically recognized molecule headers
  - User defined header
- Headless import
- Override column names
- Molecule format

Export options
- Define Molecule column name:
- Define headless export
- Define export format
- Define exported column header names

code : csv

Basic information about the format¶

CSV stands for "comma separated value" and it is very simple molecule format.

id,mol,registeting_user,note
1,C,anonymous@chemicalize.com,this is a rather common element
2,[H],h.canvenids@chemicalize.com,"I bet this is more common, how could you miss it?"
3,[He],pjc_janssen@chemicalize.com,This is boring il ne reagit pas avec quoi que ce soit!

In this file we have 3 molecules, and each of them has the following information:

ID

registering_user

note

The molecule sources are in smiles. After import we get the following structures and properties:

A simple Carbon, with:
- ID = 1
- registering_user = anonymous@chemicalize.com
- note = this is a rather common element

A simple Hydrogen, with:
- ID = 2
- registering_user = h.canvenids@chemicalize.com
- note = I bet this is more common, how could you miss it?

A simple Helium, with:
- ID = 3
- registering_user = pjc_janssen@chemicalize.com
- note = This is boring il ne reagit pas avec quoi que ce soit!

But the user can specify molecule during import which header to use. For example this file:

1
2
3

id,CHEMICAL_DATA,name
1,c1ccccc1CC(N)C,amphetamin
2,c1ccccc1,benzene

Can be imported with the following settings:

1	`csv:strucCHEMICAL_DATA`

With this MolImporter recognizes that CHEMICAL_DATA filed holds the structure.

Import options¶

Headers¶

Automatically recognized molecule headers¶

Molecule can have any Chemaxon supported formats, but they must be written in one line. The recognized molecule headers are:

mol

molecule

structure

struc

smiles

cxsmiles

smarts

cxsmarts

inchi

User defined header¶

User can define which header to use as identifier of the molecule column when importing structure. This can be done with the " struc " parameter.

For example this file:

1
2
3

id,CHEMICAL_DATA,name
1,c1ccccc1CC(N)C,amphetamine
2,c1ccccc1,benzene

Can be imported with the following settings:

1	`csv:strucCHEMICAL_DATA`

With this MolImporter recognize that CHEMICAL_DATA filed holds the structure.

Headless import¶

User can import CSV molecules without header, in this case csv importer must be informed that all rows are data (for this use " headless " keyword), and the which colum has the chemical structure. This can be done by defining the zero-based index of the structure column. For example the following file

1 2	`7,12,4,ccCCcc,rt,gh,jk 23,1,56,COO,rf,gg,kk`

Can be imported as:

1	`csv:headless,struc3`

This would import the following structure:

ccCCcc (as smiles) with the following properties:
- column_0 = 7
- column_1 = 12
- column_2 = 4
- column_3 = rt
- column_4 = gh
- column_5 = jk

COO (as smiles) with the following properties:
- column_0 = 23
- column_1 = 1
- column_2 = 56
- column_3 = rf
- column_4 = gg
- column_5 = kk

Override column names¶

During import user can dynamically override column names. For this he has to set the names in order. (Every definition starts with an " f " and separated by comma".) For example this file:

1
2
3

result,hour
S.[He],11:15:00
[He],11:10:00

can be imported as:

S.[He]
- TIME = 11:15:00

[He]
- TIME = 11:10:00

With the following params:

1	`csv:fMOL,fTIME`

In the above example the renamed headers contained an autoreconizable header name, so we did not have to specifiy molecule column. But this can be than as it is described in Header section with the " struc " keyword.?

Molecule format¶

User can specify what is the format of the molecules in the molecule column with the " format " keyword. For example for names / smiles / smarts, etc use:

csv:formatname
or:
csv:formatsmiles
or:
csv:formtsmarts
etc...

Export options¶

Define Molecule column name:¶

User can set the name of the molecule column with " struc " keyword, like:

1	`csv:strucMY_MOL_COLUMN`

Define headless export¶

User can export molecules without headers with the " headless " keyword, like:

1	`csv:headless`

Define export format¶

User can define which format to use when export molecule with the " format " keyword, like:

1	`csv:formatsmarts`

Define exported column header names¶

It is possible to define the name of the exported columns every name must start with an " f " like:

1	`csv:fname,fmol,fuser`

Control escaping of special characters¶

It is possible to have some control over what escaping strategy is used for special characters. Escaping is done on the structure field, on property column headers and property column fields. This can be carried out through the " style " keyword in a case-insensitive mode. Possible styles: Chemaxon, Default, Excel, InformixUnload, InformixUnloadCsv, MongoDBCsv, MongoDBTsv, MySQL, Oracle, PostgreSQLCsv, PostgreSQLText, RFC4180, TDF. If no style option is present, then the Chemaxon style is used.

1	`csv:styleRFC4180`