The MarkLogic Connector for Spark 2 makes it fast and easy to implement Spark jobs for ingesting and exporting data from a MarkLogic Data Hub.

Apache Spark is an in-memory, distributed data processing engine for analytical applications, including machine learning, SQL, streaming, and graph. As a unified analytical tool, it is primarily used by developers like data engineers and data scientists to build scalable data pipelines that span diverse data sources like Object Stores, RDBMS, HDFS, NoSQL etc.



Major Benefits

  • Scalable Ingestion: Build data pipelines to load data as is from any data source while tracking provenance and lineage metadata.
  • Secure Sharing: Use MarkLogic’s multi-model querying capabilities to securely share fit-for-purpose data with Spark libraries for complex analytical use-cases including machine learning and AI

Getting Started

Step through the written tutorial for Spark connector to get started ingesting and exporting.

Related Resources

Get Started

In this tutorial, you will learn how to ingest data into a MarkLogic Data Hub Service instance running on AWS using the MarkLogic Connector for Apache Spark.

Why a MarkLogic Connector for Apache Spark?

Ankur Jain discusses what Apache Spark is and why you should use it with MarkLogic in this blog.


Learn more about how you can configure the MarkLogic Connector for Apache Spark, where you will also find documentation for the AWS Glue connector.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.