Automating MarkLogic Docker Installs

by Alan Johnson and Tamas Piros

In the blog Building a MarkLogic Docker Container, we created a Docker container and installed MarkLogic. We used a Dockerfile to build the MarkLogic container and run the MarkLogic installation. Also, we used Docker Compose to automate building a 3-node MarkLogic cluster for learning purposes. In each of these examples, we manually completed the installation of MarkLogic. If not creating a standalone MarkLogic server, we also joined a MarkLogic cluster.

MarkLogic does not currently officially support Docker containers. Please use discretion when working with these examples. They are for learning purposes only.

Using MarkLogic's REST Management API, we can automate this process further. With only one docker-compose command, we can have a 3-node (or more) MarkLogic cluster created and ready to use with no manual intervention required.

What is MarkLogic Initialization?

After installing and starting MarkLogic on a single host or the first host in a cluster, we normally use a browser to connect to port 8001 on the host. MarkLogic then does the following to initialize the new MarkLogic server:

  1. MarkLogic first creates initial databases and application servers. For example, a Security database must be created to store user, role and other security information. Application servers provide Query Console on port 8000, the Administrative Interface on port 8001 and Monitoring on port 8002 and other features.

  2. Next in the intialization process is joining a cluster. Since this is the first host, we skip this step and proceed to creating an administrator account.

  3. When we initialize the first or a standalone MarkLogic server, an administrator account must be created. When additional hosts join a cluster, they use the existing administrator account.

Automating Initialization

We can automate the initialization steps and create an administrator account for a single MarkLogic server by using MarkLogic's REST API. Also, we can use MarkLogic's REST API to add a single, initialized MarkLogic server to a cluster. Before we discuss automating the full installation process, the previous blog had an example Dockerfile for single MarkLogic server installations and an example docker-compose.yml file for a 3-node MarkLogic cluster. Both needed additional steps to complete the installations once the MarkLogic containers were created. Let's review.

Creating MarkLogic Containers with Manual Initialization

The Dockerfile installs MarkLogic and exposes ports from the Docker container. Finally, the CMD line starts MarkLogic when the Docker container is created. We connect to port 8001 internally in the Docker container by pointing the host computer's browser to the exposed port. We manually proceed with the initialization steps. Here's part of the Dockerfile from before concerning MarkLogic.

For the complete discussion of the example MarkLogic Dockerfile and example MarkLogic docker-compose .yml file, please see the previous blog, Building a MarkLogic Docker Container. Also, download the examples from GitHub at https://github.com/alan-johnson/docker-marklogic.

Manually Creating the Cluster

Using docker-compose and a .yml build file (available on GitHub and in the previous blog post), we create 3 MarkLogic servers. Docker networking links these containers to each other enabling them to communicate with each other over HTTP. Once the containers are created and linked, we use a browser on the host computer to connect to port 8001 in each container. We manually initialize the first MarkLogic server (ml1.local) as the first node in the cluster and create an administrator account. Then we initialize the second (ml2.local) and third (ml3.local) MarkLogic server nodes and join the first MarkLogic server, creating the cluster.

Automating the Process

We can automate the initialization and cluster joining process by implementing the following steps.

Note: all files, including the automation scripts, can be downloaded from the GitHub repository.

  1. Create two shell scripts.

    • The first shell script, initialize-ml.sh, uses MarkLogic's REST API to initialize the MarkLogic Server and create an administrator account.

      • Using curl, we call the MarkLogic REST API to intialize MarkLogic after starting. This calls the REST API of /admin/v1/init on the MarkLogic server in the ML_HOST shell script variable. The ML_HOST variable is set to the container's hostname in the script.

        TIMESTAMP=`curl -X POST -d "" http://${ML_HOST}:8001/admin/v1/init`
      • We use curl again to call MarkLogic's REST API to create the administrator account, using the /admin/v1/instance-admin endpoint. The variables USER, PASS and SEC_REALM are set from commandline arguments to the script.

      • The script also includes code to iterate through the passed-in arguments and verify that an administrator username and password have been given. If the arguments are missing, the script returns an error and prints a usage message out. Additional code initializes variables and uses the MarkLogic REST API to wait for MarkLogic to restart.
    • Did you know you can access what is displayed to the standard or error output from within a Docker container by using the command:

      docker logs <containername or id>
    • The second shell script, add-to-cluster.sh, uses the MarkLogic REST API's /admin/v1/server-config endpoint to retrieve the cluster configuration from the initial MarkLogic server and merge it with the configuration from the second and third MarkLogic servers.

      • Use curl to get the current server configuration from the MarkLogic server that will be joining the cluster. The variable, JOINING_HOST, is set in a loop in the script to each of the passed-in MarkLogic server names that will join the cluster. Store the returned configuration in the variable, JOINER_CONFIG.

      • Use curl again to send the configuration data stored in JOINER_CONFIG to a MarkLogic server already in the cluster. Since we are creating a new cluster, this would be the first MarkLogic server created and initialized. This REST API call returns cluster configuration information that is saved to a file, cluster-config.zip.

      • Finally, we use curl to send the new cluster configuration to the MarkLogic server joining the cluster. The variable JOINING_HOST is set in a loop in the script for each MarkLogic server name that is to join the cluster.

      This puts all MarkLogic servers in the same cluster as the initial MarkLogic server.

  2. Change the Dockerfile to call our initialize-ml.sh script after starting MarkLogic.

    The changes to the Dockerfile include:

    • Copy the shell scripts to the same directory in the Docker container as the MarkLogic installer.
    • Change the CMD line to run the initialize-ml.sh script after starting MarkLogic. This initializes MarkLogic and creates an administrator account.

    Updated Dockerfile

  3. Change docker-compose.yml file to add calling the add-to-cluster.sh script for the second and third MarkLogic servers.

    Updated docker-compose.yml file

    The changes to the docker-compose.yml file include adding a command: option to the last MarkLogic node to join the cluster. The command: option overrides the default CMD line in the Dockerfile but only for the node under which it appears in the docker-compose.yml file. By doing so, we replace the command that is executed when the container is created, allowing us to:

    • Start MarkLogic on the last node to join the cluster, as the new Dockerfile also does.
    • Initialize MarkLogic and create an administrator account. Again, same as our new Dockerfile.
    • After successful initialization, call the add-to-cluster.sh script passing in the following arguments to the script.
      Argument Meaning
      -u admin The admininstrator username created in the initial MarkLogic server.
      -p admin The administrator password.
      ml1.local The Docker hostname of the first MarkLogic server.
      ml2.local The Docker hostname of a MarkLogic server to join the cluster of the first MarkLogic server.
      ml3.local The Docker hostname of a MarkLogic server to join the cluster of the first MarkLogic server.
    • Last, just like our new Dockerfile, call the tail command such that the container doesn't exit after the it is created.

Wrap up

MarkLogic server initialization is required after MarkLogic has been installed and started. Unless the MarkLogic server is joining a cluster, an administrator account must also be created. These steps are normally completed manually.

We can automate this process by using the MarkLogic REST API. Servers can be initialized, administrator accounts created and clusters can be joined by scripting. We used curl to do these steps from shell scripts, which are a good fit with Docker.

Enjoy MarkLogic and Docker!

Comments