Building a MarkLogic Docker Container

by Alan Johnson and Tamas Piros

Docker and container technology has been sweeping through the IT mindset. Containers are alternatives to virtual machines, providing isolation for applications and also a method of delivering microservices. In this article we are going to investigate how to setup MarkLogic using Docker as well as learn how to setup a 3 node MarkLogic cluster using docker-compose.

What is Docker?

Docker is an environment for creating and managing containers. Containers package up applications and their runtime dependencies. All containers share the same Linux kernel, separated by Linux namespaces and resource control groups (cgroups). Containers share their host's memory, cpus, and storage. This, of course, differs from Virtual Machines as VMs virtualize all of a computer's hardware. Picture needing a new room for your house. Docker containers add a new room to your existing house, sharing the house's electricity, plumbing and heating. Virtual Machines would build a house every time a new room is required.

Why MarkLogic and Docker

Running MarkLogic in Docker containers keeps MarkLogic versions isolated from each other. You can spin up a MarkLogic container in a matter of seconds versus the time needed for virtual machines to begin. Docker containers communicate with each other over a private Docker bridged network so MarkLogic containers can be easily clustered without much network overhead.

In order to setup Docker on your machine please visit Docker's official website.

Creating a MarkLogic Docker Container

Editor's note: Docker is not currently a supported platform. That said, it can still be a useful development and testing tool.

The first thing needed to create a container is a Dockerfile, a text file containing Docker commands. These commands tell Docker how to build an image from which containers are created.

Since Docker is a Linux technology, we'll be using CentOS 7 as a base image to build upon. We'll also need to download MarkLogic and we will be using the Red Hat 7 / CentOS 7 installer. For this example, we are using MarkLogic Version 8.0-5.5 but you can use any MarkLogic version 8+. If you are new to MarkLogic, see our Introduction to MarkLogic On Demand videos (available in multiple languages) and or get them via the MarkLogic University's Mobile App available on Apple's App Store and on Google Play.

First we are going to see a sample Dockerfile to create a MarkLogic image.

An Example MarkLogic DockerFile

A bit of a walkthrough to this Dockerfile. Docker is a Linux technology and MarkLogic runs on top of Linux in Production. Docker doesn't use Windows or Mac OSX as a base OS (yet). This Dockerfile will do these steps.

  • First, we let Docker know that we are building this image on top of a previously created CentOS 7 image called centos:centos7. If that image isn't available locally, Docker will search its repository and download it.
  • Next, we use Docker's RUN command to get any CentOS 7 updates. The yum clean all command cleans up any yum cache to help keep our image a reasonable size.
  • The base CentOS 7 image doesn't contain the initscripts package. MarkLogic uses an init.d script to start the MarkLogic Server. So again, we RUN yum to install the needed package.
  • The Docker ENV command sets any desired environment variables in the image. We are setting the search path, including MarkLogic.
  • The COPY command copies the MarkLogic Red Hat 7 installer to a temporary directory in the image.
  • Then, of course, MarkLogic needs to be installed. After the installation is successful, we delete the Marklogic installer from the temporary directory in the image since we won't be needing it.
  • Docker has its own networking. Containers can communicate with other containers. So we EXPOSE to Docker the ports MarkLogic uses.

    The MarkLogic knowledgebase lists the ports used by MarkLogic for host to host and cluster to cluster communication.

    We want these ports exposed to Docker networking:

    Port Purpose
    7997 Default HealthCheck application server port and is required to check health/proper running of a MarkLogic instance.
    7998 Default foreign bind port on which the server listens for foreign inter-host communication between MarkLogic clusters.
    7999 Default bind port on which the server listens for inter-host communication within the cluster. The bind port is required for all MarkLogic Server Clusters.
    8000 Default App-Services application server port and is required by Query Console.
    8001 Default Admin application server port and is required by the Administrative Interface.
    8002 Default Manage application server port and is required by Configuration Manager and Monitoring Dashboard.
  • Finally, the CMD command is executed when containers are created, then the container exits. We want to start MarkLogic but we don't want the container to exit. So the CMD in the Dockerfile simply starts MarkLogic then attempts to read from the file, /dev/null and show each new entry at the end of that file. Since the file doesn't exist, it simply waits forever.

You can copy the above commands and paste them into a blank text file. Save the file as Dockerfile. Copy the MarkLogic RPM file that you downloaded to the same directory and you are ready to build your MarkLogic Docker image.

Building the MarkLogic Docker Image

Docker uses the commands in the Dockerfile to create a Docker image when using Docker's build command:

docker build .

The Dockerfile is traditionally just called Dockerfile and located in the current directory. You can specify a Dockerfile location by using the -f flag:

docker build -f /path/to/Dockerfile .

Let's build a MarkLogic image and give it a name by using the -t flag. Ensure that the MarkLogic .rpm installer is in the same path as your Dockerfile and the name matches with the COPY command within the Dockerfile. Then use the following Docker command:

docker build -t marklogic:8.05-preinitialized .

The above creates an image with the given name of marklogic:8.05-preinitialized. Everything after the ":" in the name is called a tag.

Did you know? Tags in Docker are helpful to differentiate images such as MarkLogic versions.

Docker lets us know when the image is successfully built.

Use the docker images command to list all of your Docker images.

Creating a Container

A Docker container is a running instance of an image. MarkLogic containers can be created then stopped when no longer needed. The MarkLogic container can then be started again in the same state as when it was stopped. Multiple MarkLogic containers can be created if you wanted to cluster them.

Use the docker run command to create a MarkLogic container from a Docker image. Flags specify the ports to open from the host computer into the Docker container. For example, we'd want to contact each MarkLogic container's Admin Interface on port 8001 so we can configure and monitor MarkLogic running in that container.

But, remember, Docker shares the host computer's resources including it's networking. If a Docker container uses port 8001, that port is not available on the host computer. Fortunately, the docker run command has flags to control which port on the host maps to ports in the Docker container. So we could have one MarkLogic container with the Admin Interface that's available on the host's port 8001 and another MarkLogic container with its Admin Interface available on the host's port 18001. Within each container, MarkLogic is still listening on port 8001. The Docker networking takes care of whatever host to container port mapping we may choose.

The image we've created has MarkLogic but has not been through the post-installation steps. So a good place to start is to create a MarkLogic container that exposes port 8001 so we can finish the MarkLogic installation process.

Make sure you don't have MarkLogic or any other service on your host computer using port 8001. Then type in the following on the command prompt:

docker run -d --name=initial-install -p 8001:8001 marklogic:8.05-preinitialized

Docker will respond with a container identifier.

The command tells Docker to run this MarkLogic container in the background by using the -d flag. The name of the container will be initial-install. The container can be stopped, restarted and removed by referring to either its name or its container ID. The -p flag maps a host's port to a container's port. We are mapping port 8001 on the host computer to port 8001 in the MarkLogic container, the port for the Admin Interface in MarkLogic. Finally, we are specifying the container to be created from the image named marklogic:8.05-preinitialized.

After the container is created, simply point your browser to http://localhost:8001 then finish the MarkLogic post-installation steps.

For this container, skip joining a MarkLogic cluster. Other docker run flags would be needed such that MarkLogic containers could communicate with each other. For more information on MarkLogic installation procedures, see the MarkLogic documentation.

After completing the post-installation steps, create a new image for our installed version of MarkLogic. Type in the following:

docker commit initial-install marklogic:8.05-installed

This creates a new image of MarkLogic with the post-installation steps completed. With our post-install MarkLogic image created, you might want to still keep the initial-install container running or you might choose to stop and remove it. Later, we'll use the marklogic:8.05-preinitialized image to create a MarkLogic cluster.

List all containers, currently running or stopped, by using the following command:

docker ps -a

We can stop containers then remove them by using the following commands:

  • docker stop <name of container>
  • docker rm <name of container>

Finally, create a new MarkLogic container ready to use and enjoy:

docker run -d --name=mymarklogic -p 8000-8002:8000-8002 marklogic:8.05-installed

Replace the name mymarklogic with your desired name for the container. The -p flag maps the host's ports in the range of 8000 through 8002 to the container's ports in the the range of 8000 through 8002.

Did you know? You can access the shell prompt in your Docker container! Simply run the following docker command:

docker exec -it name of your container sh

The -it flags tells Docker to run this command in interactive mode and allocate a pseudo-tty.

If we add new MarkLogic application servers, we need to create a new image that contains our changes. Then we need a new container that also maps those ports. A simple plan is as follows.

  • Create desired MarkLogic App Servers and note the port number(s).
  • Use docker commit to commit the container to a new image name.
  • Use docker stop followed by the container to stop the container.
  • Then remove the container with docker rm followed by the container name.
  • Finally, create a new container based on the newly committed image by using the docker run command. The -p flag can be used multiple times with the docker run command to expose your host's ports to the new App Server ports in the container.
New to MarkLogic? Learn more about our NoSQL Enterprise Database at http://developer.marklogic.com or sign up for Free training at http://mlu.marklogic.com. Sign up for our instructor-led or self-paced courses! View short videos at http://mlu.marklogic.com/ondemand or with the MarkLogic University Mobile app available on Apple's App Store or on Google Play.

So What About a Cluster of MarkLogic Containers?

Good question and it can be done! Creating and using a MarkLogic Cluster with Docker for development and testing taxes your poor systems much less than creating 3 or more virtual systems to do the same task.

But first, a bit about Docker Networking

Earlier in this blog, we mentioned that Docker creates a subnet and manages network traffic between Docker containers and also between Docker containers and the host computer. This means that a subnet of IP addresses and host names are created and assigned to containers by Docker.

Linking Containers

How does Docker know what containers should communicate with each other? You tell Docker! When using the docker run command, you can also pass in a --link flag.

Consider the following examples:

docker run -d --name=marklogic1 --hostname=marklogic1.local -p 8000-8002:8000-8002 marklogic:8.05-preinitialized

docker run -d --name=marklogic2 --hostname=marklogic2.local --link marklogic1:marklogic1 -p 18000-18002:8000-8002 marklogic:8.05-preinitialized

The above create two MarkLogic containers. The second has the --link flag. Docker networking sets environment variables and the /etc/hosts file inside each container being linked along and also the linking container. This sets up the ability for Docker containers to communicate over the internal Docker network. The --hostname flag is used to be consistent with MarkLogic, which uses the full domain name when contacting other MarkLogic servers in the cluster. So we simply add the .local domain to the name of the container. Finally, note the -p flag on the second container exposes the MarkLogic's ports in the range of 8000 to 8002 to the host computer's ports of 18000 to 18002. Why not use the host computer's ports of 8000 to 8002? Because the first container is already using them. Remember, Docker shares networking with the host computer! You, of course, can choose any range of open ports on your host computer to map the container's MarkLogic ports.

Now, simply point your browser to port 8001 in the first container (marklogic1) and go through the post-installation steps. Skip joining a cluster. When finished, point your browser to port 18001 for the second container (marklogic2) and go through the post-installation steps. When asked to join a cluster, simply use the host name of localhost and leave the port number at 8001. MarkLogic in the second container will contact MarkLogic in the first container. The configuration will be updated such that the marklogic2 joins the cluster with marklogic1. Create and add a third MarkLogic container, also linking it to marklogic1:marklogic1 and marklogic2:marklogic2 and you'll soon have a proper 3-node MarkLogic cluster!

But wait, there's more

Docker has created another tool to aid in managing clusters of Docker containers. docker-compose has commands to create multiple containers and network them together. You can then create them, start them and stop them using docker-compose commands. Docker uses a file called Dockerfile to build containers. docker-compose uses a file called docker-compose.yml to build networks of containers.

docker-compose is available as a separate download.

Let's examine an example docker-compose.yml file used to create a MarkLogic 3-node cluster.

These docker-compose commands are similar in functionality to the Dockerfile we used to create a container. The current version of the docker-compose syntax is version 2. The .yml file begins with the version. Next, services define the 3 MarkLogic container nodes for this cluster.

  • ml8node1
    • The build command defines where the Dockerfile for this container is located. This line in the .YML file states the Dockerfile is located in the current directory.
    • The resulting image will be tagged with the name ml8:build.
    • MarkLogic ports 7997 through 7999 are exposed so the containers can communicate with each other.
    • The host's ports 8000 through 8002 are mapped to MarkLogic ports 8000 through 8002 in the container.
    • The hostname defines the networking hostname in the container so I'm using the full domain name including the .local to be consistent with standards and also MarkLogic.
  • ml8node2
    • For the second MarkLogic Docker container, we are using the same Dockerfile as the first MarkLogic node container. We are also using the remaining settings from the ml8node1 MarkLogic container definition with the exception for the following:
      • ports - On the host side, ports 8000 through 8002 are being used by ml8node1 so we simply choose another port to contact MarkLogic's Admin Interface, Query Console, etc. For ml8node2, we are using ports 18000 through 18002 to map to the MarkLogic ports 8000 through 8002 in the container.
    • links - Same as the --link flag in Docker. We are linking the ml8node2 container to the ml8node1 container.
  • ml8node3
    • Same options as ml8node2 above with the following differences.
    • ports - Again, different ports on the host are being mapped to the MarkLogic ports in the container. we are using ports 28000 through 28002 to map to the MarkLogic ports of 8000 through 8002 in the container.
    • links - we are linking to both ml8node1 and ml8node2.

The last step is of course to create the MarkLogic cluster, and we can very easily do that by using docker-compose commands. A list of most useful commands can be found below:

  • docker-compose up -d: Create the MarkLogic cluster defined in the docker-compose.yml file. The -d flag runs the MarkLogic servers in the background.
  • docker-compose stop: Stop all the MarkLogic servers in the cluster but don't delete the Docker containers.
  • docker-compose start: Start all the MarkLogic servers in the cluster again.
  • docker-compose down --rmi="all": Stop and delete all the MarkLogic server Docker containers and also remove any created images.

After the containers have been created, started and linked together, simply point your browser to the Admin Interface port of http://localhost:8001 for hostname ml1.local, http://localhost:18001 for hostname ml2.local and http://localhost:28001 for hostname ml3.local, respectively.

Proceed through each node's MarkLogic post-installation steps. The MarkLogic Installation Guide has more information on installation and adding a host to a cluster.

  • On ml1.local, skip joining a cluster.
  • On ml2.local and ml3.local, set the Host Name on the Join a Cluster page to localhost and leave the Admin Port at 8001.
  • Accept the default settings on the next page.
  • The next page confirms you are about to join a cluster.
  • Configuration with the bootstrap host (first host in the cluster) is synchronized and the node becomes part of the cluster.

Now that you have a cluster of MarkLogic servers, discover more administration topics in MarkLogic Administrator's Guide. Learn about creating high availability for your data in your databases by viewing the MarkLogic University On Demand video Setting Up Local Disk Replication. Get more knowledge on MarkLogic Security with the On Demand Security series.

Wrap up

Docker is a great technology for developers, testers, technical learners... anyone that might be interested in learning more about MarkLogic and its Enterprise features. It's also useful in day to day development and testing activities. MarkLogic clusters can quickly be created, stopped and restarted again with lower impact on your computer's resources than traditional virtual machines

Enjoy MarkLogic and Docker!

Comments

  • I modify the "docker run" command to make my home directory available inside the container: docker run -d --name=ml8 -p 8000-8099:8000-8099 -v /Users/dcassel:/home/dcassel marklogic:8.0-7
  • I did all this steps in my ubuntu machine, I am build the docker images successfully and also created the containers with out any errors or warnings. When typing the http://localhost:8001 on browser it doesn't work. Let me know what can be the possible issues.
  • Anyway, I am able to create cluster using docker-compose. Thanks a lot.
  • https://uploads.disquscdn.com/images/3820785d9f7e45d8ea492b9590c742018b4917ea49804a062324084c063dda8a.png
  • Hi, I am trying to create cluster of MarkLogic containers. I created second container with link flag and able to communicate with other container. But when I am trying to join the cluster from 18001 port (second container), I can see marklogic2.local added in cluster, but it is disconnected and also after creating cluster , I am not able to access MarkLogic installed in second container at 18001 port. Attaching screenshot for details
    • It appears that configuration information was sent from the first MarkLogic server to the second such that it was configured to be in the cluster. However, the two MarkLogic servers couldn't connect on the port MarkLogic uses for intra-cluster communication, port 7999. Make sure this port gets exposed, either in the Dockerfile or in the docker run command. A good test is to ping one MarkLogic server from the other by going to the shell in the containers. You can do this by using the docker command, docker exec -it <container name> sh. Once in the shell, simply ping the other MarkLogic server by it's hostname then repeat in the other container. This will verify that the hostnames are resolved to TCP/IP addresses correctly in each container's /etc/hosts file. If the hostnames are ping-able between containers, then it's an issue with MarkLogic ports not be exposed. You can take a look at my Dockerfile on my git-hub repository, https://github.com/alan-johnson/docker-marklogic in the Docker-ML8 folder.
      • Thanks Alan. I already exposed ports 7997-8003 in my base image. I don't think that is the problem in my case. I am using same base image for docker-compose and docker run. For docker compose, containers are able to communicate via hostnames and cluster is also created successfully. But when I use docker run to create multiple containers and try to create clustering, containers are not able to ping via hostnames. Also when we use docker-compose, it creates default network for the containers and all these containers are added into this network and can communicate via hostnames by default. Having said that, even you don't specify 'links:' inside docker-compose file containers can ping each other via hostnames. See my compose file below, ml9_nightly base image already exposed all required ports so not exposing any ports in the compose file. Compose file - version: '2' services: ml9node1: image: ml9_nightly:latest ports: - "18000:8000" - "18001:8001" - "18002:8002" hostname: "ml1.local" container_name: "ml1.local" ml9node2: image: ml9_nightly:latest ports: - "28000:8000" - "28001:8001" - "28002:8002" hostname: "ml2.local" container_name: "ml2.local" ml9node3: image: ml9_nightly:latest ports: - "38000:8000" - "38001:8001" - "38002:8002" hostname: "ml3.local" container_name: "ml3.local" In case of docker run, when we create container it's using default 'bridge' network provided by docker. With this default bridge network, containers are not able to ping each other via hostnames. So I tried creating custom bridge network (test) and attaching all containers to this network (exactly similar to docker-compose, it creates a new bridge network) and now containers are able to ping via hostnames, cluster is getting formed successfully. Also I have to give exactly similar name and hostname for a container like 'marklogic1.local' for both name and hostname. I followed below steps to make cluster work with docker run. 1. docker network create -d bridge test 2. docker run -d --name=marklogic1.local --hostname=marklogic1.local --network=test -p 18000-18002:8000-8002 ml9_nightly:latest 3. docker run -d --name=marklogic2.local --hostname=marklogic2.local --network=test -p 28000-28002:8000-8002 ml9_nightly:latest 4. docker run -d --name=marklogic3.local --hostname=marklogic3.local --network=test -p 38000-38002:8000-8002 ml9_nightly:latest
        • There is a difference between Docker default networking and Docker Compose default networking. Using Docker, by default the "link" option specifies the containers to link together for networking and adds their names to /etc/hosts in the containers. By default, Docker Compose uses a different default networking (a fake DNS). See https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/ regarding how linking containers in Docker work. Particularly, see the section talking about /etc/hosts.