Migration

This document details the process for migrating to a new ChemLocator installation. If you have not installed ChemLocator yet on the destination, follow the installation guide.

Step 1 - PostgreSQL database migration

The migration of the PostgreSQL database can be done using the official documented methods found here. Bellow we provide and example of running a backup for the current JPC container and when JPC is installed directly on server.

{info} NOTE: In case of PostgreSQL clusters please reffer to the official documentation regarding backup/restore of the databases.

For the backup and restore you will need postgre client tools provided by the official installers and if you are connecting from a different machine other than localhost (in case of Docker containers) make sure that the versons match (if the server is PostgreSQL 12 the client tools have to be from version 12 as well) and that access is allowed by adding the IP in _pghba.conf file.

Backup

Navigate to the folder where the output of the backup should be saved; make sure that enough disk space is available for saving the database dump. The size of the database: psql --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator -c "SELECT pg_size_pretty( pg_database_size('chemlocator') );", but in general the size of the backup should not exceed the size of the database on disk.

To start the database backup we will be using _pgdump command and specify the connection information to the database:

pg_dump -Fc --encoding=UTF-8 --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator > chemlocator_psql.back

After running the command, when it finishes a file called _chemlocatorpsql.back will be created containing the database information.

Restore

For the purpose of this example we assume that the backup file was copied to the destination server and that postgre client tools are available on the server.

On the destination server the database needs to be created by running following commands:

psql --dbname=postgresql://postgres:postgres@localhost:5432/postgres -c "CREATE DATABASE chemlocator;"
psql --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator -c "CREATE EXTENSION chemaxon_type;"
psql --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator -c "CREATE EXTENSION hstore;"
psql --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator -c "CREATE EXTENSION chemaxon_framework;"

After the above commands are ran, for the restore we will use _pgrestore command:

pg_restore -v -d chemlocator --dbname=postgresql://postgres:postgres@localhost:5432/chemlocator chemlocator_psql.back

Step 2 - Elasticsearch migration

Create a snapshot repository

If there is no snapshot repository you can create one by using Kibana or by using curl to make calls directly to the Elasticsearch API:

Using Kibana:

Kibana create snapshot repository

Using curl:

curl -X PUT "localhost:9200/_snapshot/cl_backup?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/es_backups"
  }
}
'

{info} NOTE: For both scenarios, the location value should be part of the path defined in elasticsearch.yml configration file.

For more information please refer to the official documentation for Elasticsearch 7.10 found here.

Create snapshot

Creating a snapshot can be done by calling the Elasticsearch API endpoints.

curl -X PUT "localhost:9200/_snapshot/cl_backup/snapshot_1?wait_for_completion=true&pretty"

Once the snapshot is finiished copy it over to the new Elasticsearch cluster, into the folder configured in elasticsearch.yml configration file.

Restore snapshot

To restore the snapshot, the easiest way is to use Kibana because you might want to exclude some of the indexes:

Start the restore process by clicking on the marked icon:

Start

In first step of the restore, select the desired indexes:

[Mandatory] chemlocator_freetext
[Optional] chemlocator_indexing_log
[Optional] chemlocator_log
[Optional] chemlocator_executiontime_log

Step 1

For the next step, adjust the options as needed.

Step 2

On this final step, click on Restore snapshot.

Step 3

On the Restore Status tab the progress of the restore process can be monitored.

Step 4

Step 3 - Updating the content source paths

In order for the new server to be able to crawl incrementally, the paths for the content sources and documents must be updated. I order to do this the following commands must be run:

Start the docker images (docker-compose up -d)
Run docker ps to view the running containers
Locate the id of the container for image hub.chemaxon.com/cxn-docker-release/chemlocator:[version] (e.g. hub.chemaxon.com/cxn-docker-release/chemlocator:3.3.15)
Run the following command for each content source: docker exec -it [CONTAINER ID] dotnet /app/ChemLocator.dll contentsource -m '{"id":1,"srcBase":"/basepath/data","destBase":"/newPath/data"}'
- to view the content source id you can run the following command: docker exec -it [CONTAINER ID] dotnet /app/ChemLocator.dll contentsource -l
- this will change any occurence of /basepath/data in the content source addresses to /newPath/data and into the stored documents metadata and will recalculate the document Url hashes. Depending on the number of locations per content source, this may take some time to complete; a progress will be displayed while this is running.

Final step

After the data has been imported and migrated please test the data before rmeoving the source ChemLocator environment, the PostgreSQL dump file and Elasticsearch snapshots