Products | Spark - MarkLogic Community

The MarkLogic connector for Apache Spark is an Apache Spark 3 connector that supports reading data from and writing data to MarkLogic. Within any Spark 3 environment, the connector enables users to easily query for data in MarkLogic, manipulate it using widely-known Spark operations, and then write results back to MarkLogic or disseminate them to another system. Data can also be easily imported into MarkLogic by first reading it from any data source that Spark supports and then writing it to MarkLogic.

Downloads

GET THE LATEST CONNECTOR ›

Major Features

Reading Data:

Schema inference based on Optic DSL query using fromView()
Batch reads and micro-batch streaming
Tune Performance via number of partitions and batch size
Read rows from MarkLogic via custom code

Writing Data:

Write rows as documents via DMSDK
Configure Document URIs, collections, permissions
Support streaming
Tune Performance via thread count and batch size

Reprocess Data

Process rows via custom code in MarkLogic

Requirements

Apache Spark 3.3.0 or higher. The connector has been tested with the latest versions of Spark 3.3.x of 3.4.x.
For writing data, MarkLogic 9.0-9 or higher.
For reading data, MarkLogic 10.0-9 or higher.

Get Started

To learn more about the project and get started visit the MarkLogic Spark documentation.

In this tutorial, you will learn how to ingest data into a MarkLogic Data Hub Service instance running on AWS using the MarkLogic Connector for Apache Spark.

Read the Tutorial

Ankur Jain discusses what Apache Spark is and why you should use it with MarkLogic in this blog.

Read the Announcement

Learn more about how you can configure the MarkLogic Connector for Apache Spark, where you will also find documentation for the AWS Glue connector.

Read the Documentation

Product

MarkLogic Connector for Spark 3

Downloads

Major Features

Requirements

Get Started

Related Resources

Stay on top of everything Marklogic.

This website uses cookies.