MarkLogic now offers a supported Kafka connector. The content below is outdated in terms of how it instructs a user to use our connector. Please visit the connector documentation for complete information on using the connector.
The Kafka-MarkLogic-Connector can help you stream data from Kafka into a MarkLogic. You can learn how about the advantages and use cases of the tool in Phil Barber’s blog Streaming Data into MarkLogic with the Kafka-MarkLogic Connector.
What we want to do now is build a working example using the tool. This tutorial will walk you through achieving a basic version of this system setup, which includes an AWS instance for MarkLogic, an AWS instance for Kafka, as well as setting up MarkLogic and Kafka.
This can serve as a starting point for creating an operational system that is scalable and has built-in redundancy. Note that you need to have an AWS account, an existing VPC, and gradle installed to proceed with the tutorial.
In order to access the AWS instances that we will be creating in this tutorial, we first need to create a key pair to login to the instances and encrypt the communication between your local environment and AWS.
To access this instance, you need the public DNS. This is available on the EC2 Dashboard from Instances in the left nav bar. Clicking on the instance name will display the instance description, which includes the public DNS. The public DNS is ONLY available when the instance is running and may change when the instance is restarted.
While on the EC2 Dashboard, it is also useful to give the instance a name such as “MarkLogic.”
ssh -i kafka.pem ec2-user@<ML Server Public DNS>
sudo vi /etc/marklogic.conf
sudo /sbin/service MarkLogic start
The latest Bitnami Kafka AMI has some significant changes. Because of the new version of the Bitnami Kafka AMI, we need to turn off SASL for this quickstart. After making the change, the Kafka service needs to be restarted.
ssh -i kafka.pem bitnami@<Kafka Server Public DNS>
sudo vi /opt/bitnami/kafka/conf/server.properties
sudo /opt/bitnami/ctlscript.sh restart kafka
/opt/bitnami/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic marklogic
Note that in step 3, the hyphens before the options must be double-hyphens. To ensure accuracy, you may want to copy and paste this.
Now that we have our environment set up, let’s get started with the Kafka-MarkLogic-Connector.
bootstrap.servers=localhost:9092
topics=marklogic
ml.connection.host=<ML Server Public DNS>
./gradlew jar
In this example, the Kafka-MarkLogic-Connector files will go on the Kafka AWS instance. The Kafka-MarkLogic-Connector simply needs access to both servers. From your local environment within the project folder:
scp -i kafka.pem config/marklogic-* bitnami@<Kafka Server Public DNS>:/tmp
scp -i kafka.pem build/libs/kafka-connect-marklogic-0.9.0.jar bitnami@<Kafka Server Public DNS>:/tmp
ssh -i kafka.pem bitnami@<Kafka Server Public DNS>
sudo mv /tmp/marklogic-* /opt/bitnami/kafka/config sudo chmod 644 /opt/bitnami/kafka/config/marklogic-* sudo chown root:root /opt/bitnami/kafka/config/marklogic-* sudo mv /tmp/kafka-connect-marklogic-0.9.0.jar /opt/bitnami/kafka/libs sudo chmod 644 /opt/bitnami/kafka/libs/kafka-connect-marklogic-0.9.0.jar sudo chown root:root /opt/bitnami/kafka/libs/kafka-connect-marklogic-0.9.0.jar
sudo /opt/bitnami/kafka/bin/connect-standalone.sh /opt/bitnami/kafka/config/marklogic-connect-standalone.properties /opt/bitnami/kafka/config/marklogic-sink.properties
The figure below shows the end of the output of the consumer after initializing, but before consuming any messages:
ssh -i kafka.pem bitnami@<Kafka Server Public DNS>
sudo vi /opt/bitnami/kafka/config/producer.properties
/opt/bitnami/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --producer.config /opt/bitnami/kafka/config/producer.properties --topic marklogic
Note that the command in step #3 is a single command. Also note that the dashes are double-dashes.
In the producer console (a “>” prompt will be displayed), enter a JSON message. (example: { Foo : “bar” }). The console with the Kafka-MarkLogic-Connector running will display related log messages.
Below is the output of the message producer after starting and after the user has entered a message:
And here is the end of the output of the consumer after initializing and consuming a single message:
You can use QConsole (http://<ML Server Public DNS>:8000) to verify the message was ingested into the Documents database in MarkLogic. Assuming you did not change “ml.document.uriPrefix” in marklogic-sink.properties, the URI will be of the following form: /kafka-data/{UUID}.json
The following figure is what you’ll see on the MarkLogic QConsole after clicking “Explore,” showing a single document in the database:
Now we’ve created a single message to be consumed by MarkLogic via the connector. If you want to test the connector with higher load, Phil Barber has also created a simple message producer project that you can use to generate messages at a higher volume. Be aware that the AWS instances we created here are small, so don’t go overboard!
By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.