JChem Neo4j Cartridge¶

JChem Neo4j Cartridge has been discontinued. Last released version is 20.7.0.

Installation
- Software requirements
- Download
- Install on Linux
  - Choose one of the installers
  - Neo4j plugin
  - Licenses
  - Start neo4j backend application
- Install on Windows
  - Choose one of the installers
  - Neo4j plugin
  - Licenses
  - Start neo4j backend application
- Configuration
- Logging

API Usage
- Create a db
- Drop a db
- Get a list of dbs
- Create trigger
- Drop trigger
- Get all triggers
- Clean up your db
- Create chemical nodes
- Delete node
- Index your nodes
- Remove nodes from database
- Substructure search with hit count limit
- Similarity search with hit count limit
- Duplicate search with hit count limit
- Utility methods

Examples
- Simple test
- Import your nodes by copying a csv file
- Chemical Search

Installation¶

Software requirements¶

Windows (64-bit), Linux

Java 11

Neo4j 4.4+

Download¶

Download package available upon request.
Please use https://chemaxon.freshdesk.com/ or support@chemaxon.com channels for requesting
download package.

Install on Linux¶

Choose one of the installers¶

rpm and deb packages unpack JNC to/opt/jnc

sh installer provides an interactive installer with graphical user interface or with a command line interface (this can be forced with the -c switch) this installer unpacks by default to /usr/local/jnc

Or simply unpack the tar.gz to a desired directory

We will refer these directories as JNC_DIR in the document.

Neo4j plugin¶

Copy the neo4j plugin to the neo4j plugin directory (copy <JNC_DIR>/plugin/neo4j-jchem-cartridge.jar to <NEO4J_INSTALL_DIR>/plugins/ ) - the sh installer does it automatically. Restart neo4j.

Licenses¶

Copy your Chemaxon JChem Neo4j license (and if you wish to use custom standardization then the Standardizer license as well) to according to this document.

If you have installed the jnc application as root, then you need to copy the license file content to /root/.chemaxon/license.cxl

Start neo4j backend application¶

If you have installed the backend application as root then you need to start the service as root as well. In this case use sudo to execute the following programs.

You can use <JNC_DIR>/bin/run-jnc script to run the cartridge backend as simple program

You can use script <JNC_DIR>/bin/jnc-service script to run cartridge backend as service (the commands it takes: start|stop|restart|status )

If any error occur the log files are available at <JNC_DIR>/logs directory.

Install on Windows¶

Choose one of the installers¶

exe installer gives you a nice gui to use that installs to <Program Files>/jnc

zip is a simple file to unzip wherever you want

These directories will be called JNC_DIR in the document.

Neo4j plugin¶

Copy the neo4j plugin to the neo4j plugin directory (copy <JNC_DIR>/plugin/neo4j-jchem-cartridge.jar to <NEO4J_INSTALL_DIR>/plugins/ ) - the exe installer does it automatically. Restart neo4j

Licenses¶

Copy your Chemaxon license to according to this document.

Start neo4j backend application¶

You can use <JNC_DIR>\bin\run-jnc.exe to run the cartridge backend as a simple program

You can use script <JNC_DIR>\bin\jnc-service.exe script to run cartridge backend as service. For this you first needs to install it with --install switch (you can uninstall with --uninstall ), than you can start it with --start (stop with --stop ) To check its state, use the --status switch and run it in the foreground with --run.

Configuration¶

Configuring runnables

You can find two runnable files (jnc-service and run-jnc). Both has their .vmoptions file where you can set parameters for virtual machines. Example content:

-Xmx4g

-Xms1g

-server

-XX:-UseConcMarkSweepGC

Adding the above lines would make the Java Virtual Machine to run in server mode, use Concurrent Mark Sweep algorithm for garbage collection and use 1 GB as minimal Heap and 4 GB as maximal heap.

Find more information here.

2.
Settings of the server

You can set options for the server in config/application.properties like the port to use, and where to look for settings, etc. You can find general settings here.

And you can set the following Chemaxon specific settings:

initOnStart : Should the database initialised on starting the application? Available values: INIT (only new database can be created), OPEN (Only existing databases can be opened), AUTO (exisiting DBs are opened, non existing ones are created.

updateMode : updateMode gets in action only if the version number has changed. Available values: EXIT (exit if version mode has changed), DROP (drop old data if version mode has changed), REINDEX (keep old data and reindex them in order to work with new version), FORCE_REINDEX (keep old data and forces reindexing at the start of the service)

jws.db.settings : path to the settings file where additional data are stored

db.configPath : path to the storage settings

3.
Settings in the database

A jnc.properties json file is created in the graph.db database folder. Some options can be changed only manually in the file, some options can be manipualted by API functions. Please, be aware that if you copy the database file, the jnc.properties file is travelling with them, because it's in the same directory, this can be an advantage and a disadvantage depending on your workflow.

Logging¶

Some logs are available in neo4j "logs" directory.

Dbms logging level can be configured in conf/neo4j.conf file. Dbms logging provides information for monitoring purposes.
Setting it to level DEBUG, you get detailed information about the processes running at the database.

1	`dbms.logs.debug.level=DEBUG`

One particularly useful information is the not indexed node ids which are not used during chemical search.

In the following log message we can see that the node with id 4 is not part of the index while nodes with ids 1,2,3,5 are indexed successfully. (Please note that the fields failedIds and successfulIds are providing information about neo4j database node IDs.)

1
2
3

2022-05-04 ... DEBUG [...] createdb called with db name test, type sample
2022-05-04 ... DEBUG [...] {"failedIds":[4],"failedIndexes":[3],"duplicatedIndexes":[],"successfulIds":[1,2,3,5],"successfulIndexes":[0,1,2,4],"duplicatedIds":[],"insertSucce
ss":[true,true,true,false,true]}

One can retrieve the problematic node from the database base on the provided id using the following cypher statement:

1	`MATCH (n) WHERE id(n) = 4 RETURN n;`

The storage server logs to <JNC_DIR>/logs/ folder.

To enable index creation and search related debug logging, uncomment the following line in <JNC_DIR>config/application.properties

1	`logging.level.com.chemaxon.jchem=DEBUG`

API Usage¶

To be able to use chemical searching functionality you need to create a database using jchem plugin. The database name is called test in all of the following examples, but you can of course use custom name.

Create a db¶

1	`call jchem.createdb('test','sample')`

where test is the db name and sample is the db type. Db type defines the business rules and it is (should be) defined in the <JNC_DIR>/config/application.properties file.

An example sample.type is provided. This type can be copied to a different name and modified if necessary (be sure to provide different typeId for each type).

Drop a db¶

1	`call jchem.dropdb('test')`

Get a list of dbs¶

1	`call jchem.dbs()`

Returns a map where the key is the name of the dbs, the value is the given type.

Create trigger¶

1	`call jchem.createTrigger('test', 'molecule', 'molString')`

where test is the db name, molecule is the node label and molString is the property.

This creates a trigger for node with molecule label and indexing molString property to the test database.

Drop trigger¶

1	`call jchem.dropTrigger('test', 'molecule', 'molString')`

where test is the db name, molecule is the node label and molString is the property.

This drops a trigger provided for node with molecule label which indexing molString property to the test database.

Get all triggers¶

1	`call jchem.triggers()`

This call lists all the registered triggers.

Clean up your db¶

Delete all molecule nodes.

1	`MATCH (n:molecule) DETACH DELETE n`

Create chemical nodes¶

1	`create (n:molecule { molString : 'CCC'})`

Delete node¶

1	`match (n:molecule { molString : 'CCC'}) delete n`

if you get memory exception

1	`match (n:molecule { molString : 'CCC'}) WITH n LIMIT 100000 delete n;`

Index your nodes¶

The molString property contains the structures.

If you have created triggers (see create trigger API) for the specific database name and molString for the structure property, then adding nodes to the db is automatically triggered, so you won't need this step.

Otherwise you need to do this call to add molecules to the db:

1	`match (n) with collect(n) as nodes call jchem.addBatch('db_name',nodes,'molString') yield responseCode return responseCode`

You may face a situation when not all compound nodes have the specific property to index. In this case you need to specify only that nodes which has the property to be indexed. So to index all Compound nodes which has molString property, call:

1	`match (n:Compound) WHERE EXISTS(n.molString) with collect(n) as nodes call jchem.addBatch('db_name',nodes,'molString') yield responseCode return responseCode`

Remove nodes from database¶

Delete molecules from the database:

1	`match (n) with collect(n) as nodes call jchem.deleteBatch('db_name',nodes) yield responseCode return responseCode`

Substructure search with hit count limit¶

1	`call jchem.search('db_name','query_string',hit_limit)`

E.g. search for benzene and get the first 10 most relevant hit:

1	`call jchem.search('test','c1ccccc1',10)`

Similarity search with hit count limit¶

1	`call jchem.search('db_name','query_string',hit_limit, 'sim', sim_threshold)`

E.g. search for 10 most similar compounds to cyclohexane which are over 0.5 similarity threshold. The hits are ordered by similarity:

1	`call jchem.search('test','C1CCCCC1',10, 'sim', 0.5)`

Duplicate search with hit count limit¶

1	`call jchem.search('db_name','query_string', hit_limit, 'dup') yield node return node`

Utility methods¶

Get the current settings:

1	`call jchem.settings()`

Check if a given molecule source is valid (importable):

1	`match (n:molecule) call jchem.canImport(n, 'molString') yield node, booleanValue return node.molString, booleanValue`

Filter nodes with invalid molecule source:

1	`match (n:molecule) with collect(n) as nodes call jchem.filterIncorrectStructures(nodes, 'molString') yield node return node`

Examples¶

Simple test¶

Create 3 nodes:

1
2
3

create (n:molecule { molString : 'CCC'})
create (n:molecule { molString : 'C1CCCCC1'})
create (n:molecule { molString : 'c1ccccc1'})

Create db:

1	`call jchem.createdb('test','sample')`

Add nodes to db:

1	`match (n) with collect(n) as nodes call jchem.addBatch('test',nodes,'molString') yield responseCode return responseCode`

Search:

1	`call jchem.search('test','c1ccccc1',10)`

Delete nodes from index:

1	`match (n) with collect(n) as nodes call jchem.deleteBatch('test',nodes) yield responseCode return responseCode`

Import your nodes by copying a csv file¶

Assume you have example1.csv in the neo4j import directory (/var/lib/neo4j/import/) containing:

CC1=CC(=O)C=CC1=O,1
S(SC1=NC2=CC=CC=C2S1)C3=NC4=C(S3)C=CC=C4,2
OC1=C(Cl)C=C(C=C1[N+](o)=O)[N+](o)=O,3
[O-][N+](o)C1=CNC(=N)S1,4
NC1=CC2=C(C=C1)C(=O)C3=C(C=CC=C3)C2=O,5

You can load the csv file as:

1 2	`LOAD CSV FROM 'file:///example1.csv' AS line CREATE (n:molecule { molString: line[0] })`

Or if you have an example2_headers.csv file (in the neo4j import directory as well) with molString and cd_id headers like:

cd_id,molString
1,CCC
2,c1ccccc1
3,C1CCCCC1

You can load the csv file with header as:

1
2
3

LOAD CSV WITH HEADERS FROM 'file:///example2_headers.csv' AS row
CREATE (n:molecule)
SET n = row, n.cd_id = toInteger(row.cd_id), n.molString = row.molString

Create db, add and delete nodes:

1
2
3

call jchem.createdb('test','sample')
match (n) with collect(n) as nodes call jchem.addBatch('test',nodes,'molString') yield responseCode return responseCode
match (n) with collect(n) as nodes call jchem.deleteBatch('test',nodes) yield responseCode return responseCode

Chemical Search¶

Substructure searching for three consecutive carbon atom with maximum 10 hits:

1	`call jchem.search('test','CCC',10)`

Returns

"node"
{"cd_id":1,"molString":"CCC"}
{"cd_id":3,"molString":"C1CCCCC1"}
{"molString":"CC1=CC(=O)C=CC1=O"}
{"molString":"NC1=CC2=C(C=C1)C(=O)C3=C(C=CC=C3)C2=O"}

Similarity search with cyclohexane and limiting the hit count:

1	`call jchem.search('test','C1CCCCC1',10, 'sim', 0.1)`

Returns

"node"
{"cd_id":3,"molString":"C1CCCCC1"}
{"cd_id":1,"molString":"CCC"}
{"cd_id":2,"molString":"c1ccccc1"}
{"molString":"CC1=CC(=O)C=CC1=O"}

The previously provided very simple chemical structure set is not enough for the following examples. Please use a larger dataset which contains molecule pairs with similarity values over 0.9, 0.8, 0.5. If you have more than 100K molecule nodes, then run these statements in batches (50-100K node at once) to avoid memory problems.

Delete all relationships

1	`start r=relationship(*) delete r;`

Add relationship between nodes with different similarities with only one relationship between nodes

1
2
3

match (n) call jchem.search('test',n.molString,0,'sim',0.9) yield node where n<>node AND not exists ((n)-[]->(node)) create (n)-[r:SIM_90]->(node)
match (n) call jchem.search('test',n.molString,0,'sim',0.8) yield node where n<>node AND not exists ((n)-[]->(node)) create (n)-[r:SIM_80]->(node)
match (n) call jchem.search('test',n.molString,0,'sim',0.5) yield node where n<>node AND not exists ((n)-[]->(node)) create (n)-[r:SIM_50]->(node)

Add relationship between nodes with different similarities with multiple relationships between nodes

1
2
3

match (n) call jchem.search('test',n.molString,0,'sim',0.9) yield node where n<>node create (n)-[r:SIM_90]->(node)
match (n) call jchem.search('test',n.molString,0,'sim',0.8) yield node where n<>node create (n)-[r:SIM_80]->(node)
match (n) call jchem.search('test',n.molString,0,'sim',0.5) yield node where n<>node create (n)-[r:SIM_50]->(node)

Find the most relevant substructure hits over 0.9 similarity

1	`call jchem.search('test','C1CCCCC1',5) yield node with node MATCH (node)-[r:SIM_90]-(n) RETURN n,r,node;`