Citus Distibuted JChem PostgreSQL Cartridge¶

JChem PostgreSQL Cartridge on a distributed PostgreSQL Citus database¶

The following software versions were used at testing:

##### Install PostgreSQL and Citus on your master and worker nodes and create the Citus extension on each node (master and workers) as described in the Citus documentation.
##### Follow the JChem PostgreSQL Cartridge Manual to set up the cartridge on each node:
1. Install the Chemaxon jchem-psql package on each node.
2. Create the Chemaxon and hstore extensions on each node.
3. Put your valid Chemaxon license file to the /etc/chemaxon/folder on each node.
##### Initialize the jchem-psql service on each node with
```
sudo service jchem-psql init
```
##### Start the jchem-psql service on each node
```
sudo service jchem-psql start
```
##### Configure workers on the master node

Create the file pg_worker_list.confin the master node’s PostgreSQL data directory (the directory declared in the postgresql.conffile as data_directory) and add the worker’s hostname and PostgreSQL port setups to this file, like:
```
worker-host1    5432
worker-host2    5421
(more workers)
```

###### Check worker nodes

SELECT * FROM master_get_active_worker_nodes();

###### Prepare CSV format SMILES file with ID before each molecule from ordinary SMILES file

In a command line shell:

cat -n nci-pubchem_1m_unique.smiles | sed -e 's/^[     ]*//' | sed -e 's/^[0-9]*/&,/' | sed -e 's/[     ]*//g' > nci1m_with_id.smiles

###### Use the distributed table as any ordinary table for search, for example:

CREATE INDEX mol_table_index ON mol_table USING chemindex(mol);  
SELECT id from mol_table WHERE 'c1ccccc1N' |<| mol;

These limitations are not invoked by JChem PostgreSQL Cartridge.

Import can be done only with a limited set of PostgreSQL methods. Only a single insert can be performed using SQL , bulk insert can be performed with a command-line toolor COPY, described here, which has a much better performance.

No subselects are allowed in a modification statement (e.g. insert, delete, update). For example INSERT INTO table2 SELECT * FROM table1 WHERE 'C' |<| mol is not supported.

Only distributed tables can be joint in one SELECT statement. A distributed and non-distributed table join is not supported.

In Citus version 5.0 explain plans are not available, but they are already available in version 5.1.

Since the Chemaxon PostgreSQL Cartridge does not contain an equality operator for Moleculetype, tables can not be distributed by hashing the Moleculetype column. Tables containing molecules have to be sharded by another column.