MarkLogic is an enterprise-class NoSQL database built on search engine technology. You can use it to store, search, and query massive amounts of data, represented as documents having various formats. MarkLogic exposes its core functionality through a Java API, allowing you to write applications in pure Java.

The Java API makes use of a powerful underlying REST API for communicating with MarkLogic Server. This tutorial will walk you through a series of HOWTOs for working with MarkLogic exclusively through its Java API, using a series of sample apps that illustrate the use cases.

MarkLogic Basics

The basic unit of organization in MarkLogic is the document. Documents can occur in one of four formats:

  1. XML
  2. JSON
  3. text
  4. binary

Each document is identified by a URI, such as “/example/foo.json”, which is unique within the database.

As with files on a filesystem, documents can be grouped into directories. This is done implicitly via the URI. For example, the document with the URI “/docs/plays/hamlet.xml” resides in the “/docs/plays/” directory.

Documents can also be grouped (independently of their URI) into collections. A collection is essentially a tag (string) associated with the document. A document can have any number of collection tags associated with it.

MarkLogic is agnostic with regard to what document structures you use. For example, it is not necessary to provide a document schema of any sort. The one general guideline to keep in mind is that, in comparison to an RDBMS, documents are like rows. In other words, since documents are the basic unit of retrieval, given the choice, it’s better to have a large number of small documents than it is to have a small number of large documents.

The Java API provides CRUD capabilities (Create, Read, Update, Delete) on documents. It also lets you perform tasks relating to searchquery, and analytics. Search and query are about finding documents. Analytics is about retrieving values from across many documents and optionally performing aggregate calculations on those values. Where MarkLogic really shines is in the combination of search and analytics, providing such things as faceted navigation across your data.

We’ll look at examples of each of these. But first, let’s get everything set up. While you’re certainly free to peruse this tutorial without running the examples, I highly recommend taking the time to install MarkLogic, download the tutorial project, and directly interact with the sample programs.

Setup

Install MarkLogic

Download and install the latest version of MarkLogic. Once you’ve installed and started up MarkLogic, go to the browser-based administrative interface (at http://localhost:8001/), where you’ll be walked through the process of getting a Developer License, as well as setting up an admin user. (This tutorial assumes you’ll be running MarkLogic on your local machine; if that’s not the case, just substitute your server name whenever you see “localhost” in this tutorial.)

If you need more detailed instructions on installing and running MarkLogic, see Installing MarkLogic Server.

Set up the tutorial project

Next, download the tutorial project: java-api-tutorial.zip. Unzip the file into a directory of your choice on your machine.

Although you’re free to use whatever IDE you prefer, the tutorial files have been packaged as a Maven project and can be opened in Eclipse using the m2eclipse plugin. If you’d like to not have to worry about CLASSPATHs and dependencies to work through this tutorial, then I encourage you to follow the additional steps below. (If you’d rather wire everything up for yourself, you can download the Java API distribution directly and skip the rest of this section.)

  1. Download and install the latest stable release of Eclipse (I used the “Indigo” and “Juno” versions while writing this tutorial.)
  2. Start up Eclipse and select the “Help”->”Install New Software…” menu.
  3. In the “Work with:” field, paste the following URL: http://download.eclipse.org/technology/m2e/releases
  4. Click the “Add…” button.
  5. In the next dialog, give the new repository a name, e.g., “m2e”, and hit OK.
  6. Once it appears, check the checkbox next to “Maven Integration for Eclipse”:
  7. Click the “Next” button and “Next” again to confirm installation.
  8. Review and accept the license in order to begin the installation. Once the installation is complete, you’ll be prompted to restart Eclipse.
  9. After Eclipse has restarted, select File->Import…
  10. In the Import dialog, select “Existing Maven Projects” under the “Maven” folder, and click the Next button.
  11. On the next screen, click “Browse…” and browse to the location where you unzipped the tutorial project, selecting “java-api-tutorial” as the root directory.
  12. Ensure that the checkbox for the project is checked and click the “Finish” button.
Create a database

We’ll use the MarkLogic REST API to create the database. Save this JSON content in a file called tutorial.json:

{
  "rest-api": {
    "name": "TutorialServer",
    "database": "TutorialDB",
    "modules-database": "Tutorial-modules",
    "port": "8011"
  }
}

Now tell MarkLogic to apply this configuration, which will create an application server, content database, and modules database. (Adjust the username and password as needed.)

curl --anyauth --user user:password -X POST -d@'./tutorial.json' -i \
  -H "Content-type: application/json" \
  http://localhost:8002/v1/rest-apis

By creating a REST API instance in this way, MarkLogic has created and configured the underlying components for you (specifically, an HTTP app server and an associated modules database). To prove that the REST API instance is running, navigate in your browser to http://localhost:8011/. You should see a page that looks something like this:

Create REST users

MarkLogic has a powerful and flexible security system. Before you can run the Java examples, you’ll first need to create a user with the appropriate execute privileges. You of course could use the “admin” user (which has no security restrictions), but as a best practice, we’re going to create two users:

  • one with the “rest-writer” role, and
  • one with the “rest-admin” role.

(There is also a “rest-reader” role available, which provides read-only access to the REST API, but we won’t be using that.)

Before we create the users, let’s go back into Eclipse and open the Config.properties file:

Take a look at its contents:

# properties to configure the examples
example.writer_user=rest-writer
example.writer_password=x
example.admin_user=rest-admin
example.admin_password=x
example.host=localhost
example.port=8011
example.authentication_type=digest

This is the default configuration file that comes with the tutorial project. You can modify it as necessary (for example, if MarkLogic is running on a different machine), but the rest of this tutorial will assume the REST API instance is located at http://localhost:8011/. Now we just need to create the “rest-writer” and “rest-admin” users referenced in the above properties file.

MarkLogic’s Management API lets you manage roles and users. You can see your current list of users by pointing a browser to http://localhost:8002/manage/v2/users?format=html.

Current users configured in MarkLogic

My instance of MarkLogic has four users defined: infostudio-admin, nobody, healthcheck, and admin. These users are all created when MarkLogic does its initial configuration.

To create the users we need, we’ll use the /manage/v2/users endpoint. (Naturally, you would typically use a better password; this is just for tutorial purposes.) Save this information as rest-writer.json:

{
  "user-name": "rest-writer",
  "password": "x",
  "description": "REST writer for the Java tutorial",
  "role": [ "rest-writer" ]
}

Send this configuration to MarkLogic with a curl command:

curl -X POST  --anyauth -u user:password --header "Content-Type:application/json" \
  -d @./rest-writer.json \
  http://localhost:8002/manage/v2/users

Repeat the same process for the “rest-admin” user (also with password “x”). Only this time, specify the “rest-admin” role instead:

{
  "user-name": "rest-admin",
  "password": "x",
  "description": "REST admin for the Java tutorial",
  "role": [ "rest-admin" ]
}
curl -X POST  --anyauth -u user:password --header "Content-Type:application/json" \
  -d @./rest-admin.json \
  http://localhost:8002/manage/v2/users

If you refresh the http://localhost:8002/manage/v2/users?format=html browser page, you should now see both users. Now we’ve got everything set up on the server side, so let’s start interacting with MarkLogic via Java.

JAVA API Basics

The first step to interacting with a MarkLogic database is to create an instance of the DatabaseClient class. Each of the example programs in this tutorial start with a step that looks something like this.
// create the client
DatabaseClient client = DatabaseClientFactory.newClient(host, port, user, password, authType);

Notice the arguments passed to the factory method. As seen in the online javadoc for the newClient() method, these correspond to:

  • host — the host with the REST server
  • port — the port for the REST server
  • user — the user with read, write, or administrative privileges
  • password — the password for the user
  • type — the type of authentication applied to the request

For all of the sample apps in the project you downloaded, these parameters are configured using the Config.properties file:

# properties to configure the examples
example.writer_user=rest-writer
example.writer_password=x
example.admin_user=rest-admin
example.admin_password=x
example.host=localhost
example.port=8011
example.authentication_type=digest

If you want to see how the tutorial project extracts these properties, see its Config.java file.

Once you’re done interacting with MarkLogic, you should always release the DatabaseClient:

// release the client
client.release();

Basic Search

Now that we’ve populated our database, let’s start taking advantage of MarkLogic’s real power: search/query. The first step to creating and executing queries is to get your hands on a QueryManager instance, which our DatabaseClient instance is happy to provide:
// create a manager for searching
QueryManager queryMgr = client.newQueryManager();

All of the sample programs referred to in this section begin with the exact line of code above.

What’s the difference between search and query? For MarkLogic, there’s no difference except in how we use the terms. A query is a search specification, and a search is the execution of a query.

This usage is reflected in how you perform a search:

// run the search
queryMgr.search(query, resultsHandle);

The search() method executes the query that you give it, sending the results to the resultsHandle you provide.

So how do we create a query? First we have to decide which of the three kinds of queries we want:

Kind of query What it does
string finds documents using a search string
query by example (QBE) finds ‘documents that look like this’ using criteria that resembles the structure of documents in your database
structured query finds documents according to an explicit hierarchy of conditions

 

Each of these is modeled by the QueryDefinition interface and its sub-interfaces:

QueryDefinition:

  • StringQueryDefinition
  • RawQueryByExampleDefinition
  • StructuredQueryDefinition

To get one of these, you start by asking your query manager for a query instance. Let’s start with a query-by-example search.

Find XML documents using a query by example

Open up Example_15_SearchQBE.java. Here we’re going to look for documents with a person in them who is described as a brother. We can draw up a simple XML example of what this would look like, noting that inside the PERSONA element, we look for the word brother.

// Create a search definition       
StringHandle handle = new StringHandle(
  "<q:qbe xmlns:q=\"http://marklogic.com/appservices/querybyexample\">\n" + 
  "  <q:query>\n" + 
  "    <PLAY>\n" +   
  "      <PERSONAE>\n" + 
  "        <PERSONA><q:word>brother</q:word></PERSONA>\n" + 
  "      </PERSONAE>\n" + 
  "    </PLAY>\n" +  
  "  </q:query>\n" + 
  "</q:qbe>"    
);      
RawQueryByExampleDefinition query = queryMgr.newRawQueryByExampleDefinition(handle);

Query by example is a powerful and easy syntax to learn for expressing a wide variety of searches. This example is in XML, but there is JSON syntax as well as you will see next.

Find JSON documents using a query by example

Open up Example_15_SearchQBE_JSON.java. Here we are looking for plenary talks.

// create a search definition   
StringHandle handle = new StringHandle(
  "{ \"$query\": { \"plenary\": true } }"
).withFormat(Format.JSON);

RawQueryByExampleDefinition query = queryMgr.newRawQueryByExampleDefinition(handle);  

// create a handle for the search results
StringHandle resultsHandle = new StringHandle().withFormat(Format.JSON);

// run the search
queryMgr.search(query, resultsHandle);
Find documents using a search string

Open up Example_16_SearchString.java. This time we’re using a StringQueryDefinition:

// create a search definition
StringQueryDefinition query = queryMgr.newStringDefinition();
query.setCriteria("index OR Cassel NEAR Hare");

After grabbing an initial string query instance from our query manager, we specify the search text using its setCriteria() method. In a real-world search application, you’d often insert user-supplied text here (what the user types in the search box). In this case, our string query is “index OR Cassel NEAR Hare“. This will find documents (regardless of format) that either contain the word “index” or have the word “Cassel” appearing near the word “Hare”. What this illustrates is that even a “simple search” can be quite powerful using MarkLogic’s default search configuration (which are called search options). Later on, we’ll see a couple examples of how to customize search options.

Run the program to see the first 10 search results, each of which includes snippets of text that matched the query.

Get another page of search results

Open Example_17_SearchWithPageSize.java. This program is identical to the previous one, except in this case we want to return a different subset of the results. All the previous examples returned the first 10 most-relevant results. Here we’re asking for the third 5 most relevant results. In other words, we’re using a smaller page size (5 results per page) and asking for the third page of results. First we store our desired page size in a variable:

public static int PAGE_SIZE = 5;

Then we set the page size on our query manager:

// Set page size to 5 results
queryMgr.setPageLength(PAGE_SIZE);

To specify which page of results we want, i.e. where we want the search results to begin, we use the search() method’s third argument (start):

// get the 3rd page of search results
int pageNum = 3;
int start = PAGE_SIZE * (pageNum - 1) + 1;
 
// get search results starting with the nth result
queryMgr.search(query, resultsHandle, start);

Run the program to see the search results starting at the 11th result.

Find documents based on their properties

Open Example_18_SearchProperties.java. Here we see our first example of a StructuredQueryDefinition. Most structured queries are only useful in conjunction with modified search options (see “Custom search” below). But using one is also necessary for a basic search against document properties:

// get a structured query builder
StructuredQueryBuilder qb = queryMgr.newStructuredQueryBuilder();
 
// Find all documents that have a property value containing the word "fish"
StructuredQueryDefinition query = qb.properties(qb.term("fish"));

Run the program to get a list of all the matching documents (photos of fish).

Search within a directory

Regardless of what kind of query it is, every query implements the following three methods specified by QueryDefinition:

  1. setDirectory()
  2. setCollections()
  3. setOptionsName()

(as well as their get* counterparts). The first two -— setDirectory() and setCollections() —- allow you to restrict a query to a particular directory or set of collections. The last one, setOptionsName(), lets you associate a query with a named set of custom search options stored on the server. (See “Custom search” below.)

Example_19_SearchDirectory.java shows an example of the first method:

// Restrict the search to a specific directory
query.setDirectory("/images/2012/02/14/");
 
// empty search defaults to returning all results
query.setCriteria("");

When you run the program, it will search only those documents in the “/images/2012/02/14/” directory.

Search within a collection

Similarly, the query in Example_20_SearchCollection.java restricts a search to a collection, thanks to the setCollections() method:

// Restrict the search to the "shakespeare" collection
query.setCollections("shakespeare");
 
// Search for the term "flower"
query.setCriteria("flower");

When you run the program, the query will return only the matches that it finds in the “shakespeare” collection.

Processing Search Results

In all of the search examples so far, we haven’t looked too closely at how the search results are extracted (and printed to the console). In each case, we’ve been using the tailor-made SearchHandle, which encapsulates search results as a POJO. Before we look more closely at that object structure, let’s take a peek at the raw data it encapsulates. We already saw how use of DocumentMetadataHandle is optional; so too the case with SearchHandle.

Get search results as raw XML

Open Example_21_SearchResultsAsXML.java. This example performs the same search as the previous example, except that instead of using a SearchHandle, here we’re using a StringHandle to receive the raw XML search results (from the server) as a string:

// create a handle for the search results to be received as raw XML
StringHandle resultsHandle = new StringHandle();
 
// run the search
queryMgr.search(query, resultsHandle);
 
// dump the XML results to the console
System.out.println(resultsHandle);

Run the program and examine the console to see how MarkLogic represents its search results in XML. This should give you an idea of the complexity of information we’re dealing with here. Also, depending on your search options, the structure of these results can vary widely.

Get search results as raw JSON

Open Example_22_SearchResultsAsJSON.java. This example is identical to the previous one, except now we configure our StringHandle to receive JSON (instead of XML, the default):

// create a handle for the search results to be received as raw JSON
StringHandle resultsHandle = new StringHandle().withFormat(Format.JSON);

Run the program to see the raw JSON search results that were fetched from the server.

Get search results as a POJO

While you are certainly free to process search results as raw JSON or XML, the preferred way in Java is to use a SearchHandle instance, which models the results using a containment hierarchy that mirrors that of the raw data we saw:

SearchHandle:

  • MatchDocumentSummary[]
  • MatchLocation[]
  • MatchSnippet[]

Open TutorialUtil.java in the tutorial project. This module contains a few different approaches to printing search results that have been used by the previous search examples. Let’s focus on the last one -— displayResults(). The first step to extracting search results from a SearchHandle is to call its getMatchResults() method:

// Get the list of matching documents in this page of results
MatchDocumentSummary[] results = resultsHandle.getMatchResults();

This yields an array of MatchDocumentSummary objects. We can illustrate what this object represents by looking at a typical search results page, such as the one on this website:

Each matching document in the list would be represented by a MatchDocumentSummary instance. This suggests that SearchHandle could then be used, for example, as the model (or to drive the model) in an MVC-based web application. Our utility code is only concerned with printing text to the console, but the basic task is the same: iterate through each level of this hierarchy and do something useful at each level.

Next, we drill down into each search result and call getMatchLocations():

// Iterate over the results
for (MatchDocumentSummary result: results) {
 
        // get the list of match locations for this result
        MatchLocation[] locations = result.getMatchLocations();

MatchLocation object represents a range of text in the document that includes a search “hit”:

In addition to getMatchResults(), the SearchHandle class provides other useful methods for building a search application, such as getFacets()getMetrics(), and getTotalResults().

Custom Search

All of the search examples thus far in this tutorial have used MarkLogic’s default query options (interchangeably called “search options”). This may suffice for some basic applications, but most of the time you will end up wanting to provide custom options. Custom options let you do things like:

  • define named constraints, which can be used in string queries, such as “tag” in “flower tag:shakespeare”
  • enable analytics and faceting by identifying lexicons and range indexes from which to retrieve values
  • extend or alter the default search grammar
  • customize the structure of the search results, including snippeting and default pagination
  • control search options such as case sensitivity and ordering

Options are grouped into named option sets on your REST API server. You can customize these either by updating the default option set, or by creating a new named option set.

Get a list of the server’s option sets

Before we start manipulating option sets, let’s query the list of current option sets. Open Example_23_ListOptionSets.java. We can read the list as a POJO by using a QueryOptionsListHandle:

// handle for list of named option sets
QueryOptionsListHandle listHandle = new QueryOptionsListHandle();

We then call our query manager’s optionsList() method to retrieve the list, storing it in our handle:

// get the list of named option sets
queryMgr.optionsList(listHandle);

And then iterate over the Map returned by the now-populated handle’s getValuesMap() method:

// iterate over the option sets; print each's name & URI
for (Map.Entry<String,String> optionsSet : listHandle.getValuesMap().entrySet()) {
    System.out.println(optionsSet.getKey() + ": " + optionsSet.getValue());                        
}

What this does is give you, the developer, a list of the available option set names you can pass to the search() method. If you don’t pass a name explicitly (as in our examples so far), then the option set named default is used.

Since we haven’t added any custom options yet, when you run this program, you should just see the “default” option set and its URI, /v1/config/query/default, which reveals the fact that you can view the raw options in your browser if you want:

Now let’s create a new set of options.

Upload custom search options

Only users with the “rest-admin” role can update option sets. Until now, all the examples in this tutorial have used the “rest-writer” user to connect to MarkLogic. Now, whenever we need to update options, we’ll connect with our “rest-admin” user instead. See in Example_24_LoadOptions.java:

// create the client, connecting as the rest-admin user
DatabaseClient client = DatabaseClientFactory.newClient(
    Config.host,
    Config.port,
    Config.admin_user,
    Config.admin_password,
    Config.authType);

To manipulate query options, we need a manager object (QueryOptionsManager), just as we need managers for other kinds of server interactions (DocumentManager for CRUD, and QueryManager for search). However, getting this manager (and other admin-related managers) takes an extra, intermediate call to newServerConfigManager():

  // get an options manager
QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager();

Next, we get a QueryOptionsBuilder, which we’ll use to construct individual query options:

// Create a builder for constructing query configurations.
QueryOptionsBuilder qob = new QueryOptionsBuilder();

As with other kinds of payloads we transmit, we need a handle to contain the query options (QueryOptionsHandle). We then use our query builder to initialize the options, passing them in using our handle’s withConstraints() method, which is part of a fluent interface for immediately populating the handle with query options:

// create the query options, defining a collection constraint
QueryOptionsHandle optsHandle = new QueryOptionsHandle().withConstraints(
    qob.constraint("tag",
        qob.collection("")));

In this case, we’re building a constraint option. Constraint means something very specific in MarkLogic. Whenever a user types a phrase of the form name:text in their search string, they’re using a constraint (assuming one has been defined for them). For example, they might type “author:melville” to constrain their search to documents authored by Herman Melville. But for this to have the intended behavior, a constraint named “author” must first be defined in the server’s query options. In our case, we want to enable users to type things like “tag:shakespeare” and “tag:mlw12”.

So we must name our constraint “tag”, which is the first argument passed to the constraint() option constructor:

qob.constraint("tag",
    qob.collection(""))

The second argument is the constraint source. In this case, we want the constraint to be backed by our collection tags, so we call our builder’s collection()method. Its argument is an optional collection tag prefix, which would be handy if we wanted to power multiple constraints via collection tags such as “author/shakespeare” and “state/california” using the prefixes “author/” and “state/”, respectively. We’re not doing this, so we pass an empty prefix (“”).

Now that the options are configured, all we need to do is write them to the server, using a name of our choosing (tutorial):

// write the query options to the database
optionsMgr.writeOptions("tutorial", optsHandle);

Run the program. Then go back and re-run the previous example (Example_23_ListOptionSets.java). You should now see that two option sets are available: default and tutorial.

Search using a collection constraint

Let’s make use of our new configuration and run a search using our “tag” constraint. Open Example_25_ConstraintOnCollection.java. To make the new option available, we need to associate our string query with the tutorial options on the server:

// use the server's "tutorial" options set
query.setOptionsName("tutorial");
query.setCriteria("flower tag:shakespeare");

This time (and from now on), we’ll slim down our code by creating the handle and populating it on the same line, taking advantage of the fact that read()returns the handle:

// run the search
SearchHandle resultsHandle = queryMgr.search(query, new SearchHandle());

Run the program. It should yield the same results as Example_20_SearchCollection.java. The only difference is that now, the “shakespeare” collection criterion is user-supplied as part of their search string in the form of the “tag” constraint.

Search using a JSON key value constraint

Normally, the code that you use to define and upload query options should reside in a different place than the code you use to run searches. For one thing, the two tasks require different levels of access. For another, your application code is designed to be run over and over again, whereas server configuration is more of a one-time thing. However, for purposes of this tutorial, we’re going to mix the two in the remaining examples. The rest of the examples will thus include two steps:

  1. Update the server configuration
  2. Run a query making use of the updated configuration

We’re going to keep using the “tutorial” options set, but rather than replacing it anew each time, we’re going to add to it. That means we’ll need to fetch it, modify it, and send the updated configuration back to the server. That’s exactly what Example_26_ConstraintOnJSONValue.java starts off by doing. First, we fetch the existing “tutorial” options by calling the readOptions() method:

// get the existing tutorial options
QueryOptionsHandle tutorialOpts =
    optionsMgr.readOptions("tutorial", new QueryOptionsHandle());

Next, we use one of the QueryOptionsHandle‘s add methods to augment the options. In this case, we’ll create another constraint using addConstraint():

// add a JSON value constraint
    tutorialOpts.addConstraint(
        qob.constraint("company",
            qob.value(
                qob.jsonTermIndex("affiliation"))));

If the “company” constraint isn’t already configured, we add it, backed by the JSON key named “affiliation”. In this case, we’re using a value constraint, which means a searched-for value must match the affiliation exactly.

Next, we write the updated options back to the server:

// write the query options back to the server
optionsMgr.writeOptions("tutorial", tutorialOpts);

Now we’re ready to test it out.

Instead of calling the query definition’s setOptionsName() method, we can also set the options when constructing the query (which we’ll do from now on):

// create a search definition using the "tutorial" options
StringQueryDefinition query = queryMgr.newStringDefinition("tutorial");

Now let’s find all the MarkLogic engineers who spoke at the conference:

// find talks with MarkLogic engineers
query.setCriteria("engineer company:marklogic");

Run the program to see the search results. You can also see the updated query options at http://localhost:8011/v1/config/query/tutorial.

Search using an element value constraint

Open Example_27_ConstraintOnElementValue.java. Here we’re defining another value constraint but against an element this time instead of a JSON key:

tutorialOpts.addConstraint(
    qob.constraint("person",
        qob.value(
            qob.elementTermIndex(new QName("PERSONA")))));

Now we can search for the King of France directly in our query text:

// find plays featuring the King of France
query.setCriteria("person:\"KING OF FRANCE\"");

Run the program to see the results.

Search using a JSON key word constraint

Open Example_28_ConstraintOnJSONWords.java. Here, instead of a value constraint, we’re using a word constraint scoped within all JSON “bio” keys:

tutorialOpts.addConstraint(
    qob.constraint("bio",
        qob.word(
            qob.jsonTermIndex("bio"))));

Unlike a value constraint (which tests for the value of the key or element), a word constraint uses normal search-engine semantics. The search will succeed if the word is found anywhere in the given context. Also, it uses stemming, which means that matching words will include equivalent forms: “strategies” and “strategy”, “run” and “ran”, etc.

Now let’s use the “bio” constraint in some search text:

// search for speakers whose bio mentions "strategy"
query.setCriteria("bio:strategy");

Run the program to see the results.

Search using an element word constraint

Open Example_29_ConstraintOnElementWords.java. This time our word constraint is against the <STAGEDIR> element:

tutorialOpts.addConstraint(
    qob.constraint("stagedir",
        qob.word(
            qob.elementTermIndex(new QName("STAGEDIR")))));

Now we can find all the Shakespeare plays where, for example, swords are involved on stage:

// search for stage directions involving swords
query.setCriteria("stagedir:sword");
Search using an element constraint

Open Example_30_ConstraintOnElement.java. Here we’re defining an element constraint:

tutorialOpts.addConstraint(
    qob.constraint("spoken",
        qob.elementQuery(new QName("SPEECH"))));

An element constraint is similar to a word constraint, except that it will match words in the element and any of its descendants. In the above case, it will match text in <LINE> element children of <SPEECH>. This is useful for searching documents that contain “mixed content” (i.e. text mixed with markup, such as <em> and <strong>).

Using this constraint will restrict the search to the spoken lines of text (excluding, for example, stage directions):

// search for mentions of swords in the script itself
query.setCriteria("spoken:sword");

Run the program to see the result.

Search using a properties constraint

We can also create a constraint for searching properties. See in Example_31_ConstraintOnProperties.java, how we do this to enable searching an image’s metadata:

tutorialOpts.addConstraint(
    qob.constraint("image",
        qob.properties()));

Now it’s easy for a user to search for photos of fish (or anything else):

// find photos of fish
query.setCriteria("image:fish");

Run the program to see the list of matching image docs.

Search using a structured query

Recall that the Java API supports three kinds of queries that can be passed to search():

  • key/value queries
  • string queries
  • structured queries

We briefly touched on a structured query in Example_18_SearchProperties.java. Now we’ll take a look at a richer use of it, utilizing the constraints we’ve defined so far. Open up Example_32_StructuredQuery.java. We’ll start by creating a StructuredQueryBuilder, associating it with our “tutorial” options:

// create a query builder using the "tutorial" options
StructuredQueryBuilder qb = new StructuredQueryBuilder("tutorial");

The query builder is analogous to the options builder in that it gives us a way of building up complex object structures using nested method calls. Only this time, rather than building up options to store on the server, we’re building up an actual query:

// build a search definition
StructuredQueryDefinition query =
    qb.or(
        // find MarkLogic speakers whose bio mentions "product"
        qb.and(
            qb.wordConstraint("bio","product"),
            qb.valueConstraint("company","MarkLogic")),
        // find plays matching all three of these constraints
        qb.and(
            qb.elementConstraint("spoken", qb.term("fie")),
            qb.wordConstraint("stagedir", "fall"),
            qb.valueConstraint("person", "GRUMIO")),
        // find photos of fish taken on February 27th
        qb.and(
            qb.properties(qb.term("fish")),
            qb.directory(true, "/images/2012/02/27/")),
        // find conference docs mentioning "fun"
        qb.and(
            qb.collection("mlw2012"),
            qb.term("fun")));

The builder’s or() method constructs a query that will find documents matching any of its argument queries (union). In contrast, an and() query restricts its results to those documents matching all of its child queries (intersection). Take a look at the StructuredQueryBuilder javadocs to see what methods you can use to construct queries. Many of these (particularly the ones with “Constraint” in their names) require you to have defined options for them to be of any use.

To run the query, we pass it to our query manager’s search() method, just as we do with string and key/value queries:

// run the search
queryMgr.search(query, resultsHandle);

Run the program to see the results. Note that the search will only give you the expected results if you’ve previously defined the “bio”, “company”, “spoken”, “stagedir”, and “person” constraints (see previous examples in this section).

For more details on the kinds of constraints you can define, see “Constraint Options” in the Search Developer’s Guide.

Analytics

“Analytics” is used to describe a class of functionality in MarkLogic that relates to retrieving values and frequencies of values across a large number of documents. With search/query, we’re interested in finding documents themselves. With analytics, we’re interested in extracting all the unique values that appear within a particular context (such as an XML element or JSON key), as well as the number of times each value occurs. An example of analytics in a MarkLogic application is the message traffic chart on MarkMail.org:

The above chart portrays ranges of email message dates bucketed by month, as well as the number of messages that appear each month. Since MarkMail hosts over 50 million messages, it of course does not go read all those messages when you load the page. Instead, whenever a new document (email message) is loaded into the database, its date is added to a sorted, in-memory list of message dates (values), each associated with a count (frequency). This is achieved through an administrator-defined index (called a range index).

A range index is one kind of lexicon. Whenever you want to perform analytics, you need to have a lexicon configured. In addition to range indexes, other lexicons include the URI lexicon and the collection lexicon. Each of these must be explicitly configured in the database.

Retrieve all collection tags

For this example, you need to have the collection lexicon enabled. Fortunately, we already took care of that at the beginning when we set up the database. Open Example_33_ValuesOfCollectionTags.java. As when defining constraints, we need a QueryOptionsBuilder for making values available, this time with the withValues() method:

// create a builder for constructing query options
QueryOptionsBuilder qob = new QueryOptionsBuilder();
 
// expose the collection lexicon as "tag" values
QueryOptionsHandle options = new QueryOptionsHandle().withValues(
    qob.values("tag",
        qob.collection("")));

The first argument to values() is the name we’ll be using when we fetch the values (“tag”); the second defines the source of those values. The collection()constructor indicates the collection lexicon as the source. Next we upload the options to the server for our subsequent use, just as we did in the constraint examples:

// write the query options to the database
optionsMgr.writeOptions(optionsName, options);

Whereas with a search we need to construct a QueryDefinition, with a values retrieval we need to construct a ValuesDefinition, passing it both the name we defined (“tag”) and the name of the options we just configured on the server:

// create a values definition
ValuesDefinition valuesDef = queryMgr.newValuesDefinition("tag", optionsName);

Similarly, whereas with search we use a SearchHandle to receive results, with values we use a ValuesHandle to receive the results:

// retrieve the values
ValuesHandle valuesHandle = queryMgr.values(valuesDef, new ValuesHandle());

The above line defines the handle and fetches the results in one step. This time, instead of calling search(), we call our query manager’s values() method. Now we’ll print out the results using the handle’s getValues() accessor:

// print out the values and their frequencies
for (CountedDistinctValue value : valuesHandle.getValues()) {
    System.out.println(
        value.get("xs:string",String.class) + ": " + value.getCount());
}

Run the program. The output shows all the collection tags and their frequency of usage (in other words, how many documents are in each collection). You can also view the values directly in your browser at: http://localhost:8011/v1/values/tag?options=Example_33_ValuesOfCollectionTags.

Retrieve all document URIs

This example requires the URI lexicon to be enabled. Starting in MarkLogic 6, it’s enabled by default, so here too we’re ready to go. Open Example_34_ValuesOfURI.java. This example is almost identical to the previous one except that we’re choosing a different values name (“uri”) and a different values source (the URI lexicon):

// expose the URI lexicon as "uri" values
QueryOptionsHandle options = new QueryOptionsHandle().withValues(
    qob.values("uri",
        qob.uri()));

The uri() constructor indicates the URI lexicon as the source. Run the program to see all the document URIs in the database, as well as how many documents they’re each associated with (the frequency). For all the JSON and XML document URIs, the answer of course is just one per document. But you might be surprised to see that each image document URI yields a count of 2. That’s because each image document has an associated properties document which shares the same URI.

Set up some range indexes

Before we can run the remaining examples in this section, we need to enable some range indexes in our database. Since we have a small number of documents, it won’t take long for MarkLogic to re-index everything. At a much larger scale, you’d want to be careful about what indexes you enable and when you enable them. That’s why such changes require database administrator access.

We’re going to set up the following range indexes:

scalar type namespace uri localname
string empty SPEAKER
string http://marklogic.com/xdmp/json/basic affiliation
int http://marklogic.com/xdmp/json/basic contentRating
unsignedLong http://marklogic.com/filter size
string http://marklogic.com/filter Exposure_Time

Navigate to your database’s configuration page for element range indexes (at http://localhost:8001/):

At the top of the page, click the “Add” tab:

Here you will enter the appropriate values for one range index. We’ll be concerned with just three form fields (leaving the rest at their defaults):

  • scalar type
  • namespace uri
  • localname

For example, to configure the first range index, you’d choose “string” for scalar type, leave the namespace uri field blank, type “SPEAKER” for localname, and hit “OK”:

This will cause the database to build a range index on all <SPEAKER> elements. Using the same process described above, add each of the remaining range indexes to your database:

scalar type namespace uri localname
string http://marklogic.com/xdmp/json/basic affiliation
int http://marklogic.com/xdmp/json/basic contentRating
unsignedLong http://marklogic.com/filter size
string http://marklogic.com/filter Exposure_Time

Now that we have the indexes configured, let’s dive back into Eclipse.

Retrieve values of a JSON key

Open up Example_35_ValuesOfJSONKey.java. First we initialize our query options with a values spec:

// expose the "affiliation" JSON key range index as "company" values
QueryOptionsHandle options = new QueryOptionsHandle().withValues(
    qob.values("company",
        qob.range(
            qob.jsonRangeIndex("affiliation",
                qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))),
        "frequency-order"));
 
// write the query options to the database
optionsMgr.writeOptions(optionsName, options);

As with collection and URI values, we start by choosing a name (“company”). This time, instead of uri() or collection(), we use range() to indicate that a range index is the source of the values. Here we must make sure that the arguments we pass exactly line up with the range index that’s configured in the database. Otherwise, you’ll get an “index not found” error when you try to retrieve the values.

You may recall that we used a jsonTermIndex() to indicate the source of a key constraint. A “term index” is always enabled as part of MarkLogic’s Universal Index and lets you lookup documents based on some criterion. In this case, we want to retrieve all the values of a given JSON key (rather than find a document, given its key). For that, we need to use the range index and thus we call jsonRangeIndex(), passing it the name of the key and the type of the indexed values (string, using the default collation).

The last thing to point out above is that, rather than return the values in alphabetical (collation) order, we want to get them in “frequency order.” In other words, return the most commonly mentioned companies first. That’s what the “frequency-order” option (passed to values()) lets you do.

Just as with the two previous examples, we create a values definition (using the name “company”) and pass it to our query manager’s values() method to retrieve the results:

// create a values definition
ValuesDefinition valuesDef = queryMgr.newValuesDefinition("company", optionsName);
 
// retrieve the values
ValuesHandle valuesHandle = queryMgr.values(valuesDef, new ValuesHandle());

Run the program to see the results. Unsurprisingly, you’ll see that MarkLogic was the most common company affiliation at the MarkLogic World conference.

Retrieve values of an element

Open up Example_36_ValuesOfElement.java. Here, rather than using a jsonRangeIndex(), we’re using an elementRangeIndex() to indicate the source of our “speaker” values:

// expose the "SPEAKER" element range index as "speaker" values
QueryOptionsHandle options = new QueryOptionsHandle().withValues(
    qob.values("speaker",
            qob.range(
                qob.elementRangeIndex(new QName("SPEAKER"),
                    qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))),
            "frequency-order"));

Run the program to see all the unique speakers in the Shakespeare plays, starting with the most garrulous.

Compute aggregates on values

Not only can we retrieve values and their frequencies; we can also perform aggregate math on the server. MarkLogic provides a series of built-in aggregate functions such as avg, max, count, and covariance, as well as the ability to construct user-defined functions (UDFs) in C++ for close-to-the-database computations.

Open up Example_37_ValuesOfJSONKeyNumeric.java. In this example, we’re going to access an integer index on the “contentRating” JSON key:

// expose the "contentRating" JSON key range index as "rating" values
QueryOptionsHandle options = new QueryOptionsHandle().withValues(
    qob.values("rating",
        qob.range(
            qob.jsonRangeIndex("contentRating",
                qob.rangeType("xs:int")))));

This time, in addition to setting up the values definition, we’ll configure it to compute both the mean and median averages:

// create a values definition
ValuesDefinition valuesDef = queryMgr.newValuesDefinition("rating", optionsName);
 
// also retrieve the averages of all ratings
valuesDef.setAggregate("avg","median");

Before fetching the values, we’ll opt to get them in descending order (highest ratings first):

// retrieve values in descending order
valuesDef.setDirection(Direction.DESCENDING);

Run the program to see how many conference talks scored 5 stars, how many scored 4 stars, etc.—as well as the mean and median rating for all conference talks.

Constrain the values returned using a query

This example starts to hint at the real power of MarkLogic: combining analytics with search. Rather than retrieve all the values of a given key, we’re going to retrieve only the values from documents meeting a certain criterion. In this case, we’ll get all the ratings for conference talks given by employees of a certain organization. To configure this, we supply both a values option and a constraint option:

QueryOptionsHandle options = new QueryOptionsHandle()
// expose the "contentRating" JSON key range index as "rating" values
.withValues(
    qob.values("rating",
        qob.range(
            qob.jsonRangeIndex("contentRating",
                qob.rangeType("xs:int")))))
    // optionally constrain results by affiliation
    .withConstraints(
        qob.constraint("company",
            qob.value(
                qob.jsonTermIndex("affiliation")
            )));

In a nutshell, the above configures two things: a “rating” lexicon and a “company” constraint. To retrieve values, we define the values definition as usual, but this time we also associate it with a query, using the setQueryDefinition() method:

// create a values definition
ValuesDefinition valuesDef = queryMgr.newValuesDefinition("rating", optionsName);
 
// create a search definition
StringQueryDefinition companyQuery = queryMgr.newStringDefinition("tutorial");
companyQuery.setCriteria("company:marklogic");
 
// return only those values from fragments (documents) matching this query
valuesDef.setQueryDefinition(companyQuery);

Run the program to see the ratings of all talks given by MarkLogic employees (documents matching the “company:marklogic” string query). You can also see these results in the browser using this URL:

http://localhost:8011/v1/values/ratingoptions=Example_38_ValuesWithQuery&q=company:marklogic&format=json

Retrieving tuples of values (co-occurrences)

In addition to retrieving values from a single source, you can also retrieve co-occurrences of values from n sources. In other words, you can perform analytics on multi-dimensional data sets. Open up Example_39_Tuples.java. In this case, we’re getting all the unique pairings of photo size and exposure time, via the withTuples() method:

// expose unique combinations (co-occurrences) of size and exposure
QueryOptionsHandle options = new QueryOptionsHandle().withTuples(
    qob.tuples("size-exposure",
        qob.tupleSources(
            qob.range(
                qob.elementRangeIndex(new QName("http://marklogic.com/filter","size"),
                    qob.rangeType("xs:unsignedLong"))),
            qob.range(
                qob.elementRangeIndex(new QName("http://marklogic.com/filter","Exposure_Time"),
                qob.stringRangeType(QueryOptions.DEFAULT_COLLATION))))));

The tupleSources() constructor takes two value sources. In this case, we’re accessing two range indexes. Like a call to values, we start with aValuesDefinition, giving it the name we configured (“size-exposure”), but then we call tuples() to fetch the tuples:

// create a values definition
ValuesDefinition valuesDef = queryMgr.newValuesDefinition("size-exposure", optionsName);
 
// retrieve the tuples
TuplesHandle tuplesHandle = queryMgr.tuples(valuesDef, new TuplesHandle());

Also, instead of a ValuesHandle, we use a TuplesHandle, which encapsulates the data in a POJO through which we can access each tuple using getTuples():

// print out each size/exposure co-occurrence
for (Tuple tuple : tuplesHandle.getTuples()) {
    System.out.println("Size: "     + tuple.getValues()[0].get(Long.class)
                   + "\nExposure: " + tuple.getValues()[1].get(String.class));
    System.out.println();
}
Searching with facets

As mentioned earlier, MarkLogic’s real power lies in the combination of search and analytics. A couple examples ago we saw how a query could be used to constrain a values retrieval. What we haven’t seen yet is how the query manager’s search() method can also return lists of values (called “facet values”) along with its search results. These facets can then be used to interactively explore your data. In this case, we’re not calling values() at all, just search.

But before we can run a faceted search, we need to define one or more constraints that are backed by a lexicon or range index. See in Example_40_SearchWithFacets.java:

QueryOptionsHandle options = new QueryOptionsHandle().withConstraints(
    // expose the "contentRating" JSON key range index as "rating" values
    qob.constraint("rating",
        qob.range(
            qob.jsonRangeIndex("contentRating",
                qob.rangeType("xs:int")),
            Facets.FACETED,
            FragmentScope.DOCUMENTS,
            qob.buckets(),
            "descending")), // highest ratings first
 
    // expose the "affiliation" JSON key range index as "company" values
    qob.constraint("company",
        qob.range(
            qob.jsonRangeIndex("affiliation",
                qob.stringRangeType(QueryOptions.DEFAULT_COLLATION)),
            Facets.FACETED,
            FragmentScope.DOCUMENTS,
            qob.buckets(),
            "frequency-order"))); // most common values first

The above configuration makes the “rating” and “company” constraints available for users to type in their query search string. You may be thinking “Isn’t that only going to be useful for power users? Most users aren’t going to bother learning a search grammar.” That’s true, but with a UI that supports faceted navigation, they won’t need to. All they’ll have to do is click a link to get the results constrained by a particular value. For example, the screenshot below from MarkMail shows four facets: month, list, sender, and attachment type:

Each of these is a facet, whose values are retrieved from a range index. Moreover, users can drill down and pick various combinations of facets simply by clicking a link, or in the case of the histogram, swiping their mouse pointer.

MarkLogic’s Java API gives you everything you need to construct a model for faceted navigation. Our sample program doesn’t include a UI, but it will run a series of searches that a user might have chosen:

String[] searches = {"", // empty search; return all results
                     "company:MarkLogic",
                     "company:MarkLogic rating:5",
                     "java rating GE 4"};

For each of the above search strings, we run the search and print out all the facets and their values:

// run the search
queryMgr.search(query, resultsHandle);
 
// Show the resulting facets & their values
for (FacetResult facet : resultsHandle.getFacetResults()) {
    System.out.println(facet.getName() + ":");
    for (FacetValue value : facet.getFacetValues()) {
        System.out.println("  " + value.getCount() + " occurrences of " + value.getName());
    }
}

Run the program to see the results.

Just as the API provides a model for a list of search results (an array of MatchedDocumentSummary instances), it also provides a model for facet results (an array of FacetResult instances). The above code gets the facets using the search handle’s getFacetResults() method, iterates through each facet, and for each of its values, prints the value and its count (frequency).

We saw earlier how the API models the search results on this site. Now we can see how it models the facet results. One facet (“Category”) is represented by a FacetResult object:

And its values are modeled by FacetValue objects:

When a user clicks on one of these values, it takes them to a new automatically constrained search results page. For example, if they click “Blog posts,” it will re-run their search with the additional constraint “category:blog”.

Learn More

Application Developer's Guide

Read the methodologies, concepts, and use cases related to application development in MarkLogic Server, with additional resources.

MarkLogic Developer Learning Track

Want to build that awesome app? Get off the ground quickly with the developer track, with instructor-led and self-paced courses.

Getting Started Video Tutorials for Developers

This series of short videos tutorials takes developers who are new to MarkLogic from download to data hub, fast.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.