Hadoop is an open-source framework for distributed processing of large data sets across clusters of computers using simple programming models. When used with MarkLogic, Hadoop provides cost-effective batch computation and distributed storage.


Major Features

  • Stage raw data in HDFS and prepare, reformat, extract, join, or filter for use in interactive applications in MarkLogic
  • Enrich or transform data in situ in MarkLogic using Java and MapReduce, taking advantage of MarkLogic’s fast indexes and security model
  • Age data out of a MarkLogic database into archival storage on HDFS or transfer it in parallel to another system
  • Leverage existing MapReduce and Java libraries to process MarkLogic data
  • Operate on data as documents, nodes, or values
  • Access MarkLogic text, geospatial, value, and document structure indexes to send only the most relevant data to Hadoop for processing
  • Send Hadoop reduce results to multiple MarkLogic forests in parallel
  • Rely on the connector to optimize data access (for both locality and streaming IO) across MarkLogic forests
  • Support for secure HDFS

Getting Started


The Connector for Hadoop is supported against the Hortonworks Data Platform (HDP) version 2.6 the Cloudera Distribution of Hadoop (CDH) version 5.8, and Mapr 5.1 The source is licensed under the commercial-friendly Apache 2.0 license and is freely available for inspection or modification.



HDFS Client Bundles

Customers can now download pre-packaged Hadoop HDFS client bundles and install them on your MarkLogic hosts. A bundle is available for each supported Hadoop distribution. Using one of these bundles is required if you use HDFS for forest storage.

HDFS Download Options

Downloads for MarkLogic 10.0-4:

Downloads for MarkLogic 9.0-12:

Downloads for MarkLogic 8.0-9:

Downloads for MarkLogic 9:

Downloads for MarkLogic 8:


MarkLogic Connector for Hadoop Developers Guide

Get started with the MarkLogic Connector for Hadoop by learning about how to deploy the Connector with a MarkLogic Server Cluster and making a secure connection to the MarkLogic Server with SSL.

Getting Started with the Connector for Hadoop

Review the procedures for installing and configuring Apache Hadoop MapReduce and the MarkLogic Connector for Hadoop.

Hadoop Functions

This module provides helper functions for creating advanced mode input and split queries for the MarkLogic Connector for Hadoop.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.