JChem Neo4j Cartridge¶
JChem Neo4j Cartridge has been discontinued. Last released version is 20.7.0.
Installation¶
Software requirements¶
- Windows (64-bit), Linux
- Java 8
Download¶
Download package appropriate for your operating system from JChem Search Engines' download page.
Install on Linux¶
Choose one of the installers¶
- rpm and deb packages unpack JNC to
/opt/jnc
- sh installer provides an interactive installer with graphical user interface or with a command line interface (this can be forced with the
-cswitch) this installer unpacks by default to/usr/local/jnc
- Or simply unpack the tar.gz to a desired directory
We will refer these directories as JNC_DIR in the document.
Neo4j plugin¶
Copy the neo4j plugin to the neo4j plugin directory (copy <JNC_DIR>/plugin/neo4j-jchem-cartridge.jar to <NEO4J_INSTALL_DIR>/plugins/ ) - the sh installer does it automatically. Restart neo4j.
Licenses¶
Copy your Chemaxon JChem Neo4j license (and if you wish to use custom standardization then the Standardizer license as well) to
If you have installed the jnc application as root, then you need to copy the license file content to /root/.chemaxon/license.cxl
Start neo4j backend application¶
If you have installed the backend application as root then you need to start the service as root as well. In this case use sudo to execute the following programs.
- You can use
<JNC_DIR>/bin/run-jncscript to run the cartridge backend as simple program
- You can use script
<JNC_DIR>/bin/jnc-servicescript to run cartridge backend as service (the commands it takes:start|stop|restart|status)
If any error occur the log files are available at <JNC_DIR>/logs directory.
Install on Windows¶
Choose one of the installers¶
- exe installer gives you a nice gui to use that installs to
<Program Files>/jnc
- zip is a simple file to unzip wherever you want
These directories will be called JNC_DIR in the document.
Neo4j plugin¶
Copy the neo4j plugin to the neo4j plugin directory (copy <JNC_DIR>/plugin/neo4j-jchem-cartridge.jar to <NEO4J_INSTALL_DIR>/plugins/ ) - the exe installer does it automatically. Restart neo4j
Licenses¶
Copy your Chemaxon license to
Start neo4j backend application¶
- You can use
<JNC_DIR>\bin\run-jnc.exeto run the cartridge backend as a simple program
- You can use script
<JNC_DIR>\bin\jnc-service.exescript to run cartridge backend as service. For this you first needs to install it with--installswitch (you can uninstall with--uninstall), than you can start it with--start(stop with--stop) To check its state, use the--statusswitch and run it in the foreground with--run.
Configuration¶
1.
Configuring runnables
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
1.
Settings of the server
1 2 3 4 5 6 7 8 9 10 11 | |
1.
Settings in the database
1 2 | |
Logging¶
Some logs are available in neo4j "logs" directory
The storage server logs to <JNC_DIR>/logs/ folder.
To enable index creation and search related debug logging, uncomment the following line in <JNC_DIR>config/application.properties
API Usage¶
To be able to use chemical searching functionality you need to create a database using jchem plugin. The database name is called test in all of the following examples, but you can of course use custom name.
Create a db¶
where test is the db name and sample is the db type. Db type defines the business rules and it is (should be) defined in the <JNC_DIR>/config/application.properties file.
An example sample.type is provided. This type can be copied to a different name and modified if necessary (be sure to provide different typeId for each type).
Drop a db¶
Get a list of dbs¶
Returns a map where the key is the name of the dbs, the value is the given type.
Create trigger¶
where test is the db name, molecule is the node label and molString is the property.
This creates a trigger for node with molecule label and indexing molString property to the test database.
Drop trigger¶
where test is the db name, molecule is the node label and molString is the property.
This drops a trigger provided for node with molecule label which indexing molString property to the test database.
Get all triggers¶
This call lists all the registered triggers.
Clean up your db¶
Delete all molecule nodes.
Create chemical nodes¶
Delete node¶
if you get memory exception
Index your nodes¶
The molString property contains the structures.
If you have created triggers (see create trigger API) for the specific database name and molString for the structure property, then adding nodes to the db is automatically triggered, so you won't need this step.
Otherwise you need to do this call to add molecules to the db:
You may face a situation when not all compound nodes have the specific property to index. In this case you need to specify only that nodes which has the property to be indexed. So to index all Compound nodes which has molString property, call:
Remove nodes from database¶
Delete molecules from the database:
Substructure search with hit count limit¶
E.g. search for benzene and get the first 10 most relevant hit:
Similarity search with hit count limit¶
E.g. search for 10 most similar compounds to cyclohexane which are over 0.5 similarity threshold. The hits are ordered by similarity:
Duplicate search with hit count limit¶
Utility methods¶
Get the current settings:
Check if a given molecule source is valid (importable):
Filter nodes with invalid molecule source:
Examples¶
Simple test¶
Create 3 nodes:
Create db:
Add nodes to db:
Search:
Delete nodes from index:
Import your nodes by copying a csv file¶
Assume you have example1.csv in the neo4j import directory (/var/lib/neo4j/import/) containing:
You can load the csv file as:
Or if you have an example2_headers.csv file (in the neo4j import directory as well) with molString and cd_id headers like:
You can load the csv file with header as:
Create db, add and delete nodes:
Chemical Search¶
Substructure searching for three consecutive carbon atom with maximum 10 hits:
Returns
| "node" |
|---|
| {"cd_id":1,"molString":"CCC"} |
| {"cd_id":3,"molString":"C1CCCCC1"} |
| {"molString":"CC1=CC(=O)C=CC1=O"} |
| {"molString":"NC1=CC2=C(C=C1)C(=O)C3=C(C=CC=C3)C2=O"} |
Similarity search with cyclohexane and limiting the hit count:
Returns
| "node" |
|---|
| {"cd_id":3,"molString":"C1CCCCC1"} |
| {"cd_id":1,"molString":"CCC"} |
| {"cd_id":2,"molString":"c1ccccc1"} |
| {"molString":"CC1=CC(=O)C=CC1=O"} |
The previously provided very simple chemical structure set is not enough for the following examples. Please use a larger dataset which contains molecule pairs with similarity values over 0.9, 0.8, 0.5. If you have more than 100K molecule nodes, then run these statements in batches (50-100K node at once) to avoid memory problems.
Delete all relationships
Add relationship between nodes with different similarities with only one relationship between nodes
Add relationship between nodes with different similarities with multiple relationships between nodes
Find the most relevant substructure hits over 0.9 similarity