The Kafka-MarkLogic-Connector, written in Java, is a community tool that uses the standard Kafka APIs and libraries to subscribe to Kafka topics and consume messages. The connector then uses the MarkLogic Data Movement SDK (DMSDK) to efficiently store those messages in a MarkLogic database. As messages stream onto the Kafka topic, the threads of the DMSDK will aggregate the messages and then push the messages into the database based on a configured batch size and time-out threshold.
All three components of the system– Kafka, MarkLogic, and Kafka-MarkLogic-Connector– are designed to easily permit new servers to be added to the system. New Kafka nodes can be used for redundancy to prevent data loss. Combined with MarkLogic’s ACID transactions, the system has extremely high reliability. New server nodes can also quickly and dynamically increase available bandwidth. As resources are maxed out, each of the three components may be expanded independently to meet data flow requirements.