Learning the MarkLogic Java API

Evan Lenz
Last updated 2012-09-14

MarkLogic is an enterprise-class NoSQL database built on search engine technology. You can use it to store, search, and query massive amounts of data, represented as documents having various formats. MarkLogic exposes its core functionality through a Java API, allowing you to write applications in pure Java. The Java API makes use of a powerful underlying REST API for communicating with MarkLogic Server. This tutorial will walk you through a series of HOWTOs for working with MarkLogic exclusively through its Java API, using a series of sample apps that illustrate the use cases.

MarkLogic basics

The basic unit of organization in MarkLogic is the document. Documents can occur in one of four formats:

  • XML
  • JSON
  • text
  • binary

Each document is identified by a URI, such as "/example/foo.json", which is unique within the database.

As with files on a filesystem, documents can be grouped into directories. This is done implicitly via the URI. For example, the document with the URI "/docs/plays/hamlet.xml" resides in the "/docs/plays/" directory.

Documents can also be grouped (independently of their URI) into collections. A collection is essentially a tag (string) associated with the document. A document can have any number of collection tags associated with it.

MarkLogic is agnostic with regard to what document structures you use. For example, it is not necessary to provide a document schema of any sort. The one general guideline to keep in mind is that, in comparison to an RDBMS, documents are like rows. In other words, since documents are the basic unit of retrieval, given the choice, it's better to have a large number of small documents than it is to have a small number of large documents.

The Java API provides CRUD capabilities (Create, Read, Update, Delete) on documents. It also lets you perform tasks relating to search, query, and analytics. Search and query are about finding documents. Analytics is about retrieving values from across many documents and optionally performing aggregate calculations on those values. Where MarkLogic really shines is in the combination of search and analytics, providing such things as faceted navigation across your data.

We'll look at examples of each of these. But first, let's get everything set up. While you're certainly free to peruse this tutorial without running the examples, I highly recommend taking the time to install MarkLogic, download the tutorial project, and directly interact with the sample programs. Instructions for doing all of that are on the next page.

Setup

Comments