Installation and Administration

    This manual serves as Administration Guide of JChem PostgreSQL Cartridge (JPC). Here you can find the description of software requirements, installation and configuration steps and running of JPC. See also Getting started guide for easy setup and use cases.

    System requirement

    Software

    • CentOS, RedHat, Debian or Ubuntu operating system x86_64 version. Find information for cases of other linux systems

    • postgreSQL 9.4 relational database (preferred download page) for JPC versions up to 2.0

    • postgreSQL 9.5 relational database for JPC versions from 2.1 to 2.10

    See help to Upgrade of PostgreSQL database together with JPC

    • postgreSQL 10 relational database for JPC versions from 3.0 to 4.4

    • EDB Postgres Advanced Server 10 relational database for JPC versions starting from 4.3 Contact us for downloading the corresponding compatible version.

    • postgreSQL 12 relational database for JPC versions starting from 5.0

    • EDB Postgres Advanced Server 12 relational database for JPC versions starting from 5.0 Contact us for downloading the corresponding compatible version.

    • postgreSQL 12, 13, 14 or 15 relational database for JPC versions starting from 22.22.0

    • availability of hstore extension of postgresql (e.g., contrib package)

    • Java17 runtime environment (starting from version 23.11). We support AdoptOpenJDK 17 as well. In the case of AdoptOpenJDK libfontconfig package must be included.

    Hardware

    The memory need of JChem PostgreSQL Cartridge strongly depends on the format of the chemical structures to be stored. In the next table, the approximate memory need of 10 M PubChem molecules in SD file format and in SMILES format are compared. The presented memory setup is optimized for execution speed. Lower amount of memory may be used, in this case performance drop is expected. Further details can be found in the Performance tuning section.

    Indexed column containing10 M PubChem molecules in Service jchem-psql[GB] Postgres shared buffer[GB] Total[GB]
    SD file format 9 21 30
    SMILES format 9 2 11

    Installation and Setup

    Install PostgreSQL Cartridge

    Download the latest version of PostgreSQL Cartridge from here. In order to install the latest version (as root), change the x.y (version information) in the following statement to the current one:

    In CentOS or RedHat:

    sudo yum localinstall jchem-psql/jchem-psql-x.y.x86_64.rpm

    In Debian:

    sudo dpkg -i jchem-psql-x.y.x86_64.deb

    See also Getting started guide for PostgreSQL and Java8 installation.

    Before the first use

    1. Install your license file

      Copy a valid Chemaxon license to /etc/chemaxon/license.cxl or set its location in /etc/default/jchem-psql file. Required license in the license file is 'Postgres Cartridge'.

      The jchem-psql user should have read access to the license file.

    2. Configurations

      1. configure PostgreSQL server

      2. configure JChem PSQL-server

      3. configure JVM heap memory setup with the help of the memory calculator

    3. Define Molecule types

    4. Initialize index directory

      
      sudo service jchem-psql init


    ​ > {primary} ​ > In case of already having obsolete index data of large tables, deleting the contents of the index directory speed up initialization: sudo rm -rf /var/lib/jchem-psql/store/* ​ > ​ > If you encounter any problem in your java installation, set your JAVA_HOME in the file /etc/default/jchem-psql. ​ >


    ​ Initialization does not start the service, you must start it manually.

    1. First start

      
      sudo service jchem-psql start

      The service can also be started by sudo service jchem-psql manual-start, however, in this case the service won't be integrated into systemd, and cannot be stopped or restarted, only started after killing the process.

    Create extension in a database

    Each chemical database should have the following extensions created separately.

    1. Create a new user and database

      sudo su postgres
      createuser testuser
      createdb testdb -O testuser
    2. Install the extensions (you must be postgres user or any other user who has create extension privilege)

      psql testdb
      testdb> CREATE EXTENSION chemaxon_type;
      testdb> CREATE EXTENSION hstore;
      testdb> CREATE EXTENSION chemaxon_framework;

      {primary} If you experience any error during the CREATE EXTENSION steps, please check this page which gives hints relating non-standard PostgreSQL setup.

    3. Checking JChem PostgreSQL Cartridge installation

      If jchem-psql service is running, the following query can be executed without any problem:

      testdb> SELECT 'C'::Molecule('sample') |<| 'CC'::Molecule;

      This select statement executes a substructure search, with a Carbon atom as the query and ethane as the target structure. You must receive true (t) as output. See details of Molecule type below.

    Service

    The service can be started/stopped using:

    sudo service jchem-psql start
    sudo service jchem-psql stop

    ENABLED=1 (default) setting in /etc/default/jchem-psql file makes the service start on system boot and start manually by start.

    If you want the service not to start on system boot, set ENABLED=0 in the /etc/default/jchem-psql file. In that case, start must be run.

    sudo service jchem-psql start

    Configure JChem-psql server

    The following configuration parameters are available on PostgreSQL side:

    Parameter name Default value Description
    chemaxon.hit_retrieval_minimum_batch_size 1 Minimum number of hits in a batch between jchem-psql server and postgresql database. Higher value generates higher initial latency, lower value creates communication overhead.
    chemaxon.hit_retrieval_maximum_batch_size 5000 Maximum number of hits in a batch between jchem-psql server and postgresql database. Higher value generates higher initial latency, lower value creates communication overhead.
    chemaxon.index_creation_batch_size 5000 (up to version 2.6)25000 (from version 2.7) Number of records in a batch between jchem-psql server and postgresql database during index creation.Higher value generates higher memory footprint, but better throughput for traditional hard disk. Cancelling the index creation is slower in case of higher values.
    chemaxon.search_wall_time_limit 600000(10 minutes) The maximum wall time available for a search operation in milliseconds.
    shared_buffers (typically) 128MB Sets the amount of memory the database server uses for shared memory buffers. This parameter can be set for performance tuning in big tables . Requires restart of postgres service.

    They - with the exception of shared_buffers - can be edited as an ordinary Postgres configuration parameter:

    • in the postgresql.conf file (e.g., /etc/postgresql/9.4/main/postgresql.conf)

    • can be overwritten in the current session:

      
      SET chemaxon.hit_retrieval_maximum_batch_size to 1000;
    • to see the current value:

      
      SHOW chemaxon.hit_retrieval_maximum_batch_size;

    The following configuration parameters are available on JChem-psql server side:

    Parameter name Description
    com.chemaxon.jchem.psql.runtime.molecule.cachedObjectCount
    From version 4.3 on: com.chemaxon.jchem.psql.molecule.cachedObjectCount
    Number of molecules in cache. Only the screened objects are searched for, so this cache can be left at a small size. Only the number of the objects can be set, so the total size of used memory depends on the current structures and can vary from time to time.
    com.chemaxon.jchem.psql.runtime.fingerprint.cachedObjectCount
    From version 4.3 on: com.chemaxon.jchem.psql.fingerprint.cachedObjectCount
    Number of fingerprints in cache. All of them are needed at every search, so it is advised always to keep them in memory for the frequently used tables.
    com.chemaxon.jchem.psql.runtime.label.cachedObjectCount
    From version 4.3 on: com.chemaxon.jchem.psql.label.cachedObjectCount
    Number of molecule labels in cache.
    com.chemaxon.jchem.psql.indexDir Directory storing the indexes needed by the cartridge.
    com.chemaxon.jchem.psql.userTypeDir Directory storing the molecule type files.
    com.chemaxon.jchem.psql.runtime.maxOpenSessionCount
    From version 4.3 on: com.chemaxon.jchem.psql.maxOpenSessionCount
    Limits the number of open service sessions at same time (0 means unlimited, default is 100). Each PSQL command which uses jchem-psql service opens a new service session and closes it at the end. Service session requests above the set value of the open sessions are rejected with a message. Be aware that the JDBC driver doesn't close the sessions at closing the statement if autocommit is turned off. In this case call commit explicitly to close the session. Available from version 2.0.
    com.chemaxon.jchem.psql.runtime.threadCount
    From version 4.3 on: com.chemaxon.jchem.psql.threadCount
    Parallel worker thread count at search queries. Default value is the number of processors. Available from version 3.0.

    These parameters can be configured in the /etc/chemaxon/jchem-psql.conf file.

    Changing these parameters except maxOpenSessionCount requires new initialization of jchem-psql service. Please note that this will purge all jchem-psql index data hence the indexes need to be dropped and recreated again.

    Healthcheck

    During operating the server sometimes it is useful to ensure if the server is still up. You can do this, by calling a dedicated healthcheck endpoint: <host-name>:<healthcheck-port>/health with http GET. If you would like to enable this endpoint, you should set com.chemaxon.jchem.psql.healthcheck.allowed true, and give a free port with com.chemaxon.jchem.psql.healthcheck.port. If the port is not given or it is not free, the endpoint will not be available. The response of this endpoint is a simple json, where the values are either "UP" or "DOWN" for the key "status". Optionally you can overwrite the default timeout for the call by setting the value for com.chemaxon.jchem.psql.healthcheck.defaultTimeoutInMillisec. If the call doesn't return in the given timeout period, the response will automatically be "DOWN". You can even overwrite this value in every http call as a query parameter eg:<host-name>:<healthcheck-port>/health?timeout=500

    Install the cartridge on separate PostgreSQL database server and application server

    Available from version 4.4.

    Software requirements for the PostgreSQL database server

    • PostgreSQL relational database (or EDB Postgres Advanced Server relational database, contact us for downloading the corresponding compatible version). See version compatibility requirements.

    • Availability of hstore extension for PostgreSQL (e.g., contrib package).

    Software requirements for the application server

    • CentOS, RedHat, Debian or Ubuntu operating system x86_64 version.

    Find information for cases of other linux systems

    • Java8 runtime environment. We support AdoptOpenJDK 8 as well. In the case of AdoptOpenJDK libfontconfig package must be included.

    Download

    Installer to download for the PostgreSQL database server:

    jchem-psql-db-server-x.y.x86_64.rpm

    or

    jchem-psql-db-server-x.y.x86_64.deb

    Installer to download for the application server:

    jchem-psql-app-server-x.y.x86_64.rpm

    or

    jchem-psql-app-server-x.y.x86_64.deb

    Installation on the PostgreSQL database server

    Run the installer

    sudo yum localinstall jchem-psql-db-server-x.y.x86_64.rpm
    or
    sudo dpkg -i jchem-psql-db-server-x.y.x86_64.deb 

    Create user and database

    sudo su postgres
    createuser testuser
    createdb testdb -O testuser

    Ceate extenstions (you must be postgres user or any other user who has create extension privilege)

    psql testdb
    testdb> CREATE EXTENSION chemaxon_type;
    testdb> CREATE EXTENSION hstore;
    testdb> CREATE EXTENSION chemaxon_framework;

    Installation on the application server

    Run the installer

    sudo yum localinstall jchem-psql-app-server-x.y.x86_64.rpm
    or
    sudo dpkg -i jchem-psql-app-server-x.y.x86_64.deb 

    Configuration on the PostgreSQL server

    Edit the postgresql.conf file, add

    chemaxon.server_host= IP address or name of the application server machine

    If not the default port (9090) is used for the application server, then add this line as well:

    chemaxon.server_port= port of jchem-psql service of the appilcation server machine

    After setting these values the PostgreSQL server has to be restarted to take effect.

    Configuration on the application server

    Edit the jchem-psql.conffile and allow access to the service port from the PostgreSQL server, e.g. by setting access from any machine:

    com.chemaxon.jchem.psql.host=0.0.0.0

    The service port can also be set (if not the default is used), but it is not required:

    com.chemaxon.jchem.psql.port=<your port number>

    The default port is 9090.

    Affter the configuration you must place the license file, initialize the service and start the service as written in the Before the first use section above.

    High Availability (HA)

    From version 5.1, JPC can be installed in a way that enables running in HA mode.

    See the JPC HA installation guide.

    Molecule types

    Molecule types define the interpretation mode of the chemical structures. Molecule types are based on the extension chemaxon_type. The definition of these types has to be stored in /etc/chemaxon/types/ folder as a <molecule_type_name>.type file. Type file names cannot contain capital letters. You can add, modify, delete molecule type files according to your needs.

    We provide a sample type, see file /etc/chemaxon/types/sample.type.

    The following settings can be defined in a <molecule_type_name>.type file:

    • Version of the type descriptor (at the moment, only '1' is accepted)

    • Type ID: positive unique identifier among types

    • Tautomer mode: OFF, GENERIC, CANONIC_GENERIC_HYBRID (or) NORMAL_CANONIC_GENERIC_HYBRID, NORMAL_CANONIC_NORMAL_GENERIC_HYBRID

      CANONIC_GENERIC_HYBRID is available from version 20.12 and is deprecated in version 23.12
      NORMAL_CANONIC_GENERIC_HYBRID available from version 23.12 with the same fuctionality as CANONIC_GENERIC_HYBRID
      NORMAL_CANONIC_NORMAL_GENERIC_HYBRID is available from version 23.12
      See details of tautomer modes here.

    • canonicalTautomerHeavyAtomLimit: default value 100, available from version 22.22 See details here.

    • stereoAssumption: ABSOLUTE, RELATIVE (Available from version 20.12.)

      This parameter affects the tetrahedral stereo interpretation mode of structures without chiral flag. See table here. The upper row represents ABSOLUTE, the lower row represents RELATIVE.

    • Standardizer action string or Standardizer file containing standardization requirements

    Standardizer action string must follow the syntax of command line standardizer actions:

    Example :

    standardizerAction = aromatize:b..addexplicitH..replaceatoms:queryatom='C':replaceatom='N'

    Standardizer configuration file can be created as described in Creating a Configuration Standardizer page.

    Example :

    standardizerFile = /etc/chemaxon/types/customStandardizerConfiguration.xml

    {primary} It is very important that the structures compared in the searches be in aromatized form. So, please do not forget to add aromatize standardizer action as the last action of the standardizer action string or configuration file.

    {primary} The types stored in /etc/chemaxon/types/ are loaded when the jchem-psql service is initialized. Therefore, if you add a new molecule type or change an existing one, the following steps have to be executed (using the current version of JChem Postgres Cartridge).

    {primary} In the case of already existing chemindex or sortedchemindex type indexes, you have to list them by the following SQL in order to know which indexes should be reindexed in Step 8 below.

    SELECT schemaname, tablename, indexname, indexdef
    FROM pg_indexes
    WHERE indexdef like '%USING%chemindex%'
    ORDER BY tablename, indexname, indexdef;
    1. Create/modify/delete molecule type file(s) according to needs

    2. Stop the service: sudo service jchem-psql stop

    3. Delete the content of the index directory: sudo rm -rf /var/lib/jchem-psql/store/*

    4. Initialize the service: sudo service jchem-psql init

    5. Start the service: sudo service jchem-psql start

    6. Change to postgres user: sudo su postgres

    7. Login to your database: psql

    8. Cast the type of the existing molecule type column if the given type file was modified:
      drop index <indexname>
      update table_name set structure_column_name = structure_column_name::text::molecule('molecule_type_name');
      create index ...

    9. Reindex the already existing chemindex and sortedchemindex type indexes: reindex index <indexname>

    {primary} The query structure and the target structure(s) must always have the same molecule type. There is an auto-casting implemented; this means, that in the case of comparing two molecules, it is enough to define the molecule type of either of them. The following three statements have the same meaning:

    SELECT 'C'::Molecule('sample') |<| 'CC'::Molecule('sample');
    SELECT 'C'::Molecule('sample') |<| 'CC'::Molecule;
    SELECT 'C'::Molecule |<| 'CC'::Molecule('sample');

    Invalid select statements

    • missing molecule type definition
    SELECT 'C'::Molecule |<| 'CC'::Molecule;
    • different molecule type definition
    SELECT 'C'::Molecule('type1') |<| 'CC'::Molecule('type2');

    In the case of database search, the molecule type of the structure column relates to the query structure as well.

    How to create and use custom standardizer actions

    The aim is to create such a working customStandardizerConfiguration.xml which contains - beside the built-in standardizer actions provided by Chemaxon - custom standardizer action(s) as well.

    • Create custom standardizer action

      • Based on the documentation available here you can create a custom standardizer action in Java programming language. As a result of your development, you need to get a jar file which contains the necessary action.

      • Open the Standardizer Action Manager within the Standardizer application (available in JChem Base), then set the above created jar file and apply the Accept, then the Export buttons to create the standardizer action factory configuration file: userdefinedstandardizers.xml

      • Use the same Standardizer application to set up your customStandardizerConfiguration.xml. The left list in the Selecting Standardizer actions step must contain the custom action created above.

      • Copy the above created standardizer action factory configuration file userdefinedstandardizers.xml to jchem-psql user’s home .chemaxon folder:

        sudo mkdir /var/lib/jchem-psql/.chemaxon

        sudo cp ~/.chemaxon/userdefinedstandardizers.xml /var/lib/jchem-psql/.chemaxon/

    • Use custom standardizer action

      • For using customStandardizerConfiguration.xml follow the steps in section Molecule types

    Upgrade

    There are four different levels of the JPC upgrade process needed depending on the changes between the from and up versions of JPC. The highest level is when the PostgreSQL database must also be upgraded, and the lowest level is when only restarting jchem-psql service is needed.

    JChem PostgreSQL Cartridge History of Changes informs about the necessary upgrade method for upgrading from the previous version up to the actual version.

    If you upgrade from an older version than the previous one, the highest level of upgrades mentioned between your old version and the actual version will be necessary.

    Upgrade of PostgreSQL database together with JPC (Level 1)

    This method must be applied if not only JChem PostgreSQL Cartridge, but PostgreSQL database upgrade is also needed ( e. g., from JPC 5.0 postgreSQL 12 is required).

    See the detailed steps of the process: Upgrade of PostgreSQL database together with JPC.

    This update method can be automatized.

    Upgrade when domain indexes must be dropped (Level 2)

    This method is needed when - because of the changes between the two versions - the old indexes must be dropped. You can execute the upgrade with or without keeping the existing molecule type data.

    This update method can be automatized

    Upgrade with keeping existing molecule type data

    1. Drop the sortedchemindexes and chemindexes

      Before dropping these indexes, it might be helpful to collect information about the existing indexes:

      Example to collect index definitions:

      SELECT schemaname, tablename, indexname, indexdef
      FROM pg_indexes
      WHERE indexdef like '%USING%chemindex%'
      ORDER BY tablename, indexname, indexdef;
    2. Stop the service

      sudo service jchem-psql stop
    3. Install the new version

    4. Initialize the service (this step deletes the old indexes)

      sudo service jchem-psql init

      {primary} Only the index contents are deleted, the molecule records stored in the PostgreSQL database stay untouched.

    5. Start the service

      sudo service jchem-psql start
    6. Update the extensions

        sudo su postgres
        psql 
      
        ## ALTER EXTENSION chemaxon_type UPDATE;
        ## ALTER EXTENSION chemaxon_framework UPDATE;
    7. Update molecule type columns of the tables

        update table_name set structure_column_name = structure_column_name::text::molecule('molecule_type_name');
    8. Create indexes

      create index myindex on mytable using chemindex(mol); 
      //or
      create index myindex on mytable using sortedchemindex(mol);

    Upgrade without keeping existing molecule type data

    If not only JChem PostgreSQL Cartridge, but postgreSQL database upgrade is also needed ( e. g., from JPC 2.1 postgreSQL 9.5 is required), find an example of how to Upgrade of PostgreSQL database together with JPC.

    1. Before installing the new version, drop the following extensions

      DROP EXTENSION chemaxon_framework CASCADE; 
      DROP EXTENSION chemaxon_type CASCADE;

      {warning} If you drop the chemaxon_type extension, it will implicitly drop all chemical columns.

    2. Stop the service

      sudo service jchem-psql stop
    3. Delete the content of the index directory

      sudo rm -rf /var/lib/jchem-psql/store/*
    4. Install the new version

    Upgrade requiring only reindex (Level 3)

    This method can also be executed with or without keeping the existing molecule type data, but Upgrade with keeping existing molecule type data is recommended.

    Upgrade with keeping existing molecule type data

    1. Stop the service

      sudo service jchem-psql stop
    2. Install the new version

    3. Initialize the service (this step deletes the old indexes)

      sudo service jchem-psql init

      {primary} Only the index contents are deleted, the molecule records stored in the PostgreSQL database stay untouched.

    4. Start the service

      sudo service jchem-psql start
    5. Upgrade your existing postgresql database

      1. Update the extensions

          sudo su postgres
          psql 
        
          ## ALTER EXTENSION chemaxon_type UPDATE;
          ## ALTER EXTENSION chemaxon_framework UPDATE;

        {primary} Between some versions the changes are too complex to keep the index definitions throughout the upgrade process. In these cases you have to drop the indexes before altering the chemaxon_framework extension. If you get an error message similar to "ERROR: cannot drop operator class chemindex_int_ops for access method chemindex because other objects depend on it DETAIL: index upg_ind depends on operator class chemindex_int_ops for access method chemindex HINT: Use DROP ... CASCADE to drop the dependent objects too.".

        Which indexes should be dropped? There are two suggested methods:

        1. Drop the index referred in the DETAIL part of the message and repeat the update step until it is completed successfully:
        ## ALTER EXTENSION chemaxon_framework UPDATE;

        b. Collect the existing index definitions with one query and execute the drop (and later the create) index scripts according to the returned rows:

        SELECT schemaname, tablename, indexname, 
        indexdef||';' create_index_script,
        'DROP INDEX '||schemaname||'.'||indexname||';' drop_index_script
        FROM pg_indexes
        WHERE indexdef like '%USING%chemindex%'
        ORDER BY tablename, indexname, indexdef;
      2. Reindex the chemindex and sortedchemindex type indexes

        To take the advantage of all the performance enhancements you have to update the tables storing chemical structures. The indexes should be dropped before the upgrade and recreated thereafter.

        update table_name set structure_column_name = structure_column_name::text::molecule('molecule_type_name');

        Example to collect index definitions:

        SELECT schemaname, tablename, indexname, indexdef
        FROM pg_indexes
        WHERE indexdef like '%USING%chemindex%'
        ORDER BY tablename, indexname, indexdef;

        Example to update column and reindex:

        drop index myindex;
        update mytable set mol = mol::text::molecule('sample');
        commit;
        create index myindex on mytable using chemindex(mol); 
        //or
        create index myindex on mytable using sortedchemindex(mol);

        Even if the above table update is skipped for some reason, reindexing all the chemindex and sortedchemindex type indexes is necessary to use the indexed tables:

          psql
          ## reindex index <indexname>;

        {warning} If you miss this step, all indexes of type chemindex and sortedchemindex will not function properly.

    Upgrade requiring only service restart (Level 4)

    The following steps needs to be executed

    1. Stop the service

      sudo service jchem-psql stop
    2. Install the new version

    3. Start the service

      sudo service jchem-psql start
    4. Update the extensions

        sudo su postgres
        psql 
      
      ## ALTER EXTENSION chemaxon_type UPDATE;
        ## ALTER EXTENSION chemaxon_framework UPDATE;

    Uninstall

    The present chemaxon_framework extension must be dropped:

    DROP EXTENSION chemaxon_framework; 
    DROP EXTENSION chemaxon_type;

    {warning} If you drop the chemaxon_type type extension, it will implicitly drop all chemical columns.

    The installed jchem-psql-x.y.x86_64 package can be removed.

    Logging

    Available from version 3.0.

    Apache Log4j Java-based utility serves for logging JChem PostgreSQL Cartridge.

    The configuration file is /etc/chemaxon/jpc-log4j.xml Default jpc-log4j.xml :

    <Configuration status="WARN">
      <Appenders>
        <RollingFile name="InfoFile" fileName="/var/log/jchem-psql/info.log"
                     filePattern="logs/$${date:yyyy-MM}/jchem-psql-%d{yyyy-MM-dd}-%i.log">
                <PatternLayout pattern="%d{MMM dd, HH:mm:ss.SSS} [%t] %-5level %logger{36} %msg%n"/>
                <Policies>
                    <TimeBasedTriggeringPolicy />
                    <SizeBasedTriggeringPolicy size="20 MB"/>
                </Policies>
        </RollingFile>
        <RollingFile name="ErrorFile" fileName="/var/log/jchem-psql/error.log"
                     filePattern="logs/$${date:yyyy-MM}/jchem-psql-%d{yyyy-MM-dd}-%i.log">
                <PatternLayout pattern="%d{MMM dd, HH:mm:ss.SSS} [%t] %-5level %logger{36} %msg%n"/>
                <Policies>
                    <TimeBasedTriggeringPolicy />
                    <SizeBasedTriggeringPolicy size="20 MB"/>
                </Policies>
        </RollingFile>
      </Appenders>
      <Loggers>
        <Root level="info">
          <AppenderRef ref="InfoFile" level="info"/>
          <AppenderRef ref="ErrorFile" level="error"/>
        </Root>
      </Loggers>
    </Configuration>

    By default, rolling logging is applied (every day or after 20 MB a new folder starts to collect the log files) on INFO and on ERROR level. If you want to modify the logging configuration, see Apache Log4j guideline.

    Archiving

    To archive the content of the following directories might be useful:

    • /var/lib/jchem-psql

    This directory contains the indexing data. If this folder is restored, the same indexes will be usable again without reindexing.

    • /var/log/jchem-psql

    This directory contains the log files. If jchem-psql server is restarted, the old log files will be removed. So it may be important to save these files before restarting the service to find out why it stopped.

    • /etc/chemaxon

    This directory contains the user settings, e.g., custom molecule types or custom memory settings.

    Performance tuning

    Performance of the cartridge highly depends on the cache, JVM memory and PostgreSQL settings.

    Here you find a guide on how to configure

    • Cache size settings

    • JVM memory settings

    • PostgreSQL settings Parallel query execution

    The performance of the first search can be improved by the Load cache method.

    Cache sizes

    {info} Xmx and Cache Size Calculator

    JChem PostgreSQL Cartridge Cache and Memory Calculator is provided for calculating cache parameters and JVM memory settings according to the size of your database.

    The table below shows the cache memory needs of the PostgreSQL Cartridge when a chemical index of 1 million structures is in use. Memory size values are represented in megabytes. The first value in each cell shows the size of cache memory required by the corresponding backend for every 1 million of stored chemical structures. Values in parentheses represent the same memory need in the case when not only duplicate, full fragment, substructure, similarity search but also superstructure search is to be carried out. In this case an increased amount of cache is necessary.

    Values in square brackets depend on the number of chemical indexes present in database. Number of chemical indexes is denoted by ‘idx’. These values do not scale with number of structures but depend only on the number of chemical indexes.

    For example, in case of 10 million structures when a single sorted chemindex is used with 512 bits fingerprint, fingerprint cache requires 10 * 70 MB + 22 MB = 722 MB.

    Required size of molecule cache depends significantly on the molecule set stored. In general larger structures need more space, and smiles format is more compact than mrv or mol format. Cache sizes in the table are measured using the PubChem database in mol format.

    Chemindex Sorted chemindex
    Fingerprint bits 512 (default) 1024 512 (default) 1024
    Cache size used (MB)
    Fingerprint cache 70 (*2) 132 (*2) 70 + [22 idx] (2) 132 + [88 idx] (2)
    Molecule cache 290 (610) 290 (610) 290 (610) 290 (610)
    Label cache 44 (88) 44 (88) 44 (88) 44 (88)

    Fingerprint bits

    Fingerprint bit length can be set in the relevant type file by applying:

    • fpLength=512

    • fpOnes=2

    or

    • fpLength=1024

    • fpOnes=3

    Setup strategies

    All fits into cache

    When there is enough memory available it is recommended to set up all cache sizes so that every record fits into. This configuration results in the fastest DB operations in general, however its memory need is the highest.

    Label cache is left out

    In this case label cache size is set 0. This is the second fastest configuration. Search operations are speedy but modification operations (insert, update, vacuum) might be slower.

    When setting cache count to zero, it is recommended to switch off caching for the corresponding backend completely. This can be done using the following configuration parameter:

    com.chemaxon.jchem.psql.runtime.label.cachePolicy=DISABLED
    
    From version 4.3 on:
    com.chemaxon.jchem.psql.label.cachePolicy=DISABLED

    Label and molecule cache is left out

    In this case only fingerprint cache is in use. Label and molecule cache sizes are set to 0. Limited search operations remain still fast but search operations without hit limit may be slower. Modification operations (insert, update, vacuum) might be slower.

    When setting cache count to zero, it is recommended to switch off caching for the corresponding backend completely. This can be done using the following configuration parameters:

    com.chemaxon.jchem.psql.runtime.label.cachePolicy=DISABLED
    com.chemaxon.jchem.psql.runtime.molecule.cachePolicy=DISABLED
    
    From version 4.3 on:
    com.chemaxon.jchem.psql.label.cachePolicy=DISABLED
    com.chemaxon.jchem.psql.molecule.cachePolicy=DISABLED

    Cache setup

    Current PostgreSQL Cartridge version uses MapDB as storage backend. This allows to configure cache object counts, not cache sizes. Cache object counts may be configured in file: /etc/chemaxon/jchem-psql.conf

    The jchem-psql service has to be restarted after changes have been made to this file. The following keys have to be used:

    com.chemaxon.jchem.psql.runtime.molecule.cachedObjectCount
    com.chemaxon.jchem.psql.runtime.fingerprint.cachedObjectCount
    com.chemaxon.jchem.psql.runtime.label.cachedObjectCount
    
    From version 4.3 on:
    com.chemaxon.jchem.psql.molecule.cachedObjectCount
    com.chemaxon.jchem.psql.fingerprint.cachedObjectCount
    com.chemaxon.jchem.psql.label.cachedObjectCount

    Object count is the number of structures that have to fit into cache.

    An extra fingerprint object count for every sorted chemindex has to be calculated with. This extra count is 320,000 per index in case of 512 bit fingerprints and 640,000 per index for 1024 bit fingerprints. This overhead affects only the fingerprint cache count. The extra counts must be included only when using sorted chemindex (in contrast with “normal” chemindex). For example, in case of two structure tables, both consisting of 5 million entries, when two sorted chemindexes are used with 1024 bits fingerprints, fingerprint cache count requires: 2 5,000,000 + 2 640,000 = 11,280,000.

    Java VM heap memory setup

    Maximum memory allowed for the Java VM has to be configured, as well. The amount of required memory can be calculated as:

    MaxMem = 2500 MB + 1.33 * Cache size (fingerprint + molecule + label)

    Maximal VM memory can be configured in file /etc/default/jchem-psql

    The setting ‘-Xmx’ in entry ‘JCHEM_PSQL_OPTS’ has to be modified.

    The default Xmx value is determined by Java execution engine. For more information about the default Xmx value see How is the default max Java Heap size determined?.

    PostgreSQL settings

    If the user would like to execute queries

    1. on big tables - containing many entries and big row data. E.g., the molecule source is a verbose one (sdf, mrv, ...)

    2. and PostgreSQL needs to fetch majority of the rows. E.g., sql query without any additional restricting condition or limit parameter

    then it is advisable to increase the size of PostgreSQL shared buffer so that PostgreSQL can fetch all rows rapidly using cache. In order to achieve this, set the shared_buffers parameter in postgresql.conf to be able to store your table.

    Table size can be checked by \dt+ <tablename> command from the psql client.

    Additionally, to enhance PostgreSQL performance, it may be advisable to increase linux's shared memory with the following command:

    
    sysctl -w kernel.shmall = <shared memory size in bytes>/<page size> 

    The default value of page size is usually 4096. It can be checked by getconf PAGESIZE.To store the value of kernel.shmall permanently, add it it to the sysctl.conf file. For more details about required shared memory settings for PostgreSQL server please visit the PostgreSQL documentation.

    To reduce the needed buffer size, you may consider the following possibilities.

    1. Change the input format to a concise one: smiles, smarts, ...

      (e.g., 8M rows of a PubChem dataset need ca. 17 GB when the molecules are in sdf format, but only ca. 1.4 GB when they are in smiles format).

    2. Avoid queries that require the fetching of all table data through adding additional "where" condition or limit parameter.

      For example, instead of

      
      SELECT <id_column_name> FROM <table_name> WHERE 'c1ccccc1' |<| mol;

      which may return several millions of hits, use the following statement applying LIMIT <n> in order to obtain the most relevant n hits:

      
      SELECT <id_column_name> FROM <table_name> WHERE 'c1ccccc1' |<| mol LIMIT 100;
    3. Increase the fillfactor to 90% instead of the above recommended 50% and perform vacuum regularly.

    4. As PostgreSQL is not optimized for the COUNT() method, it may take a long time if it returns large value. For obtaining an estimation of the count of hits we rather suggest using the EXPLAIN command.

    Please also check that the JVM heap size plus the PostgreSQL shared buffer size is not more than 2/3-rd of the total available memory to avoid slow-down of the operation system.

    If the required amount of physical memory is not available, then for optimizing the performance it is more important to setup proper JVM heap size than to increase the PostgreSQL shared buffer size.

    Parallel query execution

    From version 21.5.0, it is possible to run parallel queries in JPC. For its usage, follow PostgreSQL documentation.

    Load cache

    For improving the performance of the first search, you need to load the molecule and the fingerprint caches and PostgreSQL shared buffer.

    Extension pg_prewarm is required.

    
    CREATE EXTENSION pg_prewarm;

    After restarting jchem-psql service, the following SQL statements must be issued:

    
    SELECT load_molecule_cache('indexname');
    SELECT load_fingerprint_cache('indexname');
    SELECT pg_prewarm('tablename', 'buffer');

    where

    i ndexname = the name of the sortedchemindex or chemindex created on the column storing the chemical structures in the given table

    tablename = the name of the given table where the first search is intended to be speeded up

    Migration from JChem Oracle Cartridge

    A Migration guide helps moving from JChem Oracle Cartridge (JOC) to JPC.

    Furthermore, see the Comparison of JOC and JPC.