The Kafka-MarkLogic-Connector, written in Java, is a supported tool that uses the standard Kafka APIs and libraries to subscribe to Kafka topics and consume messages. The connector then uses the MarkLogic Data Movement SDK (DMSDK) to efficiently store those messages in a MarkLogic database. As messages stream onto the Kafka topic, the threads of the DMSDK will aggregate the messages and then push the messages into the database based on a configured batch size and time-out threshold.

All three components of the system– Kafka, MarkLogic, and Kafka-MarkLogic-Connector– are designed to easily permit new servers to be added to the system. New Kafka nodes can be used for redundancy to prevent data loss. Combined with MarkLogic’s ACID transactions, the system has extremely high reliability. New server nodes can also quickly and dynamically increase available bandwidth. As resources are maxed out, each of the three components may be expanded independently to meet data flow requirements.

VISIT THE REPOSITORY

Learn More

Streaming Data into MarkLogic with the Kafka-MarkLogic Connector

Read about how Philip Barber’s tool can help you stream data from Kafka into MarkLogic easily and reliably.

Quickstart with the Kafka-MarkLogic-Connector in AWS

Walk through this tutorial to build a working example using the kafka-marklogic-connector to achieve a basic version of this system setup.

Zookeeper and Kafka

Need more information about Apache Kafka and Zookeeper? The Apache Kafka website has instructions for starting up Zookeeper and Kafka. The tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.