With traditional relational databases, persisting your in-memory data structures requires complex ORM (Object-Relational Mapping) tools to handle the well-known impedance mismatch. Next-generation NoSQL databases that support variety on stored information can provide a simpler solution. In this tutorial, find out how to store and search POJOs in a MarkLogic database without giving up consistency, reliability, or scale.
MarkLogic Server is an Enterprise NoSQL database, supporting a schema-optional document data model, ACID transactions, security, and real-time search indexing. Supported document formats include XML, JSON, text, and even binary (such as video or PDF). Features include:
The latest version adds the MarkLogic Java API to make it easy to take advantage of the server in your Java applications. For this tutorial, you’ll download the free version of MarkLogic Server. We’ll work through some typical data discovery scenarios with a music dataset, executing queries both to answer specific questions and to get a better overall understanding of the dataset. To make things simple, we’ll work with data in a POJO representation. The setup steps consist of installing MarkLogic Server, downloading the tutorial, and running a bootstrapping utility that defines a couple of users and creates the database and REST server.
Download and install the latest version of MarkLogic from https://developer.marklogic.com/products. Once you’ve installed and started MarkLogic, go to the browser-based administrative interface (at https://localhost:8001/), which will walk you through getting a Developer license and creating an admin user. (This tutorial assumes you’ll be running MarkLogic on your local machine; if that’s not the case, just substitute your server name whenever you see “localhost” in this tutorial.)
For more detailed instructions on installing and running MarkLogic, see Installing MarkLogic Server.
After starting the server, download the tutorial source code from https://developer.marklogic.com/media/pojo-tutorial-01.zip. Unzip the distribution. You’ll find a standard Maven source structure that you can use, for instance, in m2e. You can, of course, work with the sources and classes without Maven if you prefer by looking for the sources under the src/main/java directory and for runtime environment under the target/classes directory.
In the following sections, we’ll only show the highlights from the source code and output. To get the most out of this tutorial, you should view the complete examples in your IDE or editor and run the examples to see the complete output.
To run the tutorial examples, you’ll need to set up a Java 6 runtime environment (preferrably the latest stable distribution). You configure your CLASSPATH in the usual way:
This tutorial focuses on application programming rather than MarkLogic server adminstration. Therefore, this tutorial provides a utility to set up the server environment in one step. Before you start, find and check the values in tutorial.properties. The default values should be correct for your setup; simply ensure that the values for tutorial.bootstrap_user and tutorial.bootstrap_password match the adminstrative credentials for the MarkLogic server. Be wary of modifying the other values shipped with tutorial.properties. To bootstrap the REST server’s environment, run the following command at the command line:
Bootstrapping the Tutorial’s server-side environment
java -cp CLASSPATH com.marklogic.client.tutorial.util.CreateDatabaseServer
Alternatively, use an IDE to execute this class’s main method. When its done, this command will have completed the following:
x
“.x
“.Later, when you want to set up your own database, REST server, and indexes, go to https://localhost:8000/appservices/, click the New Database button, select the database, and click the Configure button. Now we’re ready for a quick look at the dataset.
The dataset for this tutorial consists of top songs extracted from Wikipedia (https://en.wikipedia.org/wiki/Category:Lists_of_number-one_songs_in_the_United_States). Each song is described by a standalone tree structure modelled with nested POJOs (similar to JSON but with strong typing). To enable processing by JAXB, the POJO classes have two JAXB annotations: one on the root class for the tree structure and one on the descr property.
@XmlRootElement public class TopSong { ... public Artist getArtist() { ... } @XmlAnyElement public Element getDescr() { ... } }
The descr
property contains marked-up text as a target for fulltext search. Other key properties include exactly one artist
as well as zero or many writers
, producers
, genres
, and weeks
.
The tutorial source provides the serialized POJOs in XML files. Aside from the descr
property, the POJOs are vanilla Java beans and could be loaded from a Java object input stream or any other source.
The POJOWriter example creates a database client and iterates over the serialized POJOs files, using JAXB to write the POJOs to the database as separate documents. Each document has a unique URI and contains a root object and its subordinate objects. Here’s the source code condensed to focus on the important parts (which will also be true of subsequent examples).
DatabaseClient dbClient = DatabaseClientFactory.newClient( "localhost", 8005, "rest-admin", "x", Authentication.DIGEST); XMLDocumentManager docMgr = dbClient.newXMLDocumentManager(); JAXBContext context = JAXBContext.newInstance(TopSong.class); JAXBHandle writeHandle = new JAXBHandle(context); for (File songfile: inputDir.listFiles()) { TopSong song = ... read the serialized POJO from the file ... ; writeHandle.set(song); docMgr.write("/topsongs/"+songfile.getName(), writeHandle); } dbClient.release();
Every application using the API creates a DatabaseClient before interacting with the database and releases the client afterward. Subsequent examples will omit these statements to focus on new ideas.
The example above calls the XMLDocumentManager.write() method to persist each POJO as a document in the database. The JAXBHandle class adapts JAXB for integration into the API. The API uses adapters like JAXBHandle to integrate standard content representations as diverse as binary InputStream, character String, and StAX XMLStreamReader.
The POJOReader example confirms the previous load by calling the XMLDocumentManager.read() method to get a POJO from the database, again using JAXB.
XMLDocumentManager docMgr = dbClient.newXMLDocumentManager(); JAXBContext context = JAXBContext.newInstance(TopSong.class); JAXBHandle readHandle = new JAXBHandle(context); docMgr.read("/topsongs/Aretha-Franklin+Respect.xml", readHandle); TopSong song = (TopSong) readHandle.get(); ... print the properties of the POJO ...
The example prints out the POJO properties, producing the following output:
document: /topsongs/Aretha-Franklin+Respect.xml
title | Respect
artist | Aretha Franklin
writers | Otis Redding
producers | Steve Cropper
genres | Soul
weeks | 1967-06-03 | 1967-06-10
Subsequent examples will search these properties and the text of the descr
property.
Now we’re ready to investigate the top songs dataset. Looking at the output for Respect
, we might wonder whether Otis Redding wrote any other hit songs.
The KeyValueSearcher example finds all documents where the writer element contains the exact value Otis Redding
. Such searches resemble equals predicates in the WHERE clause of an SQL database but can operate on varied document structures instead of rigid relational tables.
QueryManager queryMgr = dbClient.newQueryManager(); KeyValueQueryDefinition keyValueQry = queryMgr.newKeyValueDefinition(); keyValueQry.put( queryMgr.newElementLocator(new Qname("writer")), "Otis Redding"); SearchHandle searchHandle = queryMgr.search(keyValueQry, new SearchHandle()); for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) { System.out.println("document: "+docSum.getUri()); for (MatchLocation docLoc: docSum.getMatchLocations()) { System.out.println(" location: "+docLoc.getPath()); System.out.println(" matched: "+docLoc.getAllSnippetText()); } }
All queries use a QueryManager. (Subsequent examples skip its construction.) The KeyValueQueryDefinition class specifies the query criteria. The call to QueryManager.search() searches the database. SearchHandle parses the results into a Java structure reflecting documents matched by the query and locations matched within each document. You can also get search results in JSON or XML if you prefer.
The example iterates over the matched documents and locations to generate the following output, which answers the question. Otis Redding wrote two top songs.
document: /topsongs/Aretha-Franklin+Respect.xml location: /topSong/writers matched: Otis Redding document: /topsongs/Otis-Redding+Sittin-On-The-Dock-of-the-Bay.xml location: /topSong/writers matched: Otis Redding
For JSON documents, you can search on the value of a key in much the same way.
When investigating a dataset, one question often leads to another. We might wonder whether Aretha Franklin and Otis Redding collaborated on other top songs. We can start with a simple string search.
A string search expresses query criteria including phrases and Booleans similar to the Google search box. You can prompt a user for the criteria, but it’s also convenient for specifying static criteria in an application. Like a search engine, theStringSearcher example matches documents that contain both of the phrases Aretha Franklin
and Otis Redding
in any location.
StringQueryDefinition stringQry = queryMgr.newStringDefinition(); stringQry.setCriteria("\"Aretha Franklin\" AND \"Otis Redding\""); SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle()); for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) { ... }
The example differs from the previous example only in the use of StringQueryDefinition to specify the criteria.
In some cases, a quick phrase search is enough to get the answer. In this case, however, the output shows that the search was too general.
document: /topsongs/Aretha-Franklin+Respect.xml
location: /topSong/artist/artistId
matched: https://en.wikipedia.org/wiki/Aretha_Franklin
location: /topSong/artist
matched: Aretha Franklin
location: /topSong/descr/p[1]
matched: …Stax recording artist Otis Redding in 1965. “Respect” became…
document: /topsongs/Jailhouse-Rock-Elvis-Presley+You-Send-Me-Summertime-…
location: /topSong/descr/p[4]
matched: …Aretha Franklin, The Supremes, Otis Redding
The search matched phrases mentioning Aretha Franklin and Otis Redding in the description, which doesn’t indicate whether they collaborated on the song.
To get a definitive answer for our question, we need to constrain our phrase search to the artist
and writer
properties. We define constraints with query options. Query options specify the static parts of a query including not only constraints but the result page length and so on. You write query options to the database before executing a search that supply the dynamic parts of the query including the criteria, the result page number, and so on.
The ConstrainedSearcher example builds the query options as a data structure in Java:
QueryOptionsManager optMgr = dbClient.newServerConfigManager().newQueryOptionsManager(); QueryOptionsBuilder optBldr = new QueryOptionsBuilder(); QueryOptionsHandle optHandle = new QueryOptionsHandle(); optHandle.withConstraints( optBldr.constraint("artist", optBldr.elementQuery(new QName("artistName"))), optBldr.constraint("writer", optBldr.elementQuery(new QName("writer")))); optMgr.writeOptions("constraints", optHandle);
As you might expect, the API provides a QueryOptionsManager to write, read, and delete query options. To build options as a Java structure, you use QueryOptionsBuilder and QueryOptionsHandle. In particular, the call toQueryOptionsHandle.withConstraints() specifies constraints on the artist
and writer
properties. That makes it possible to restrict search phrases to these properties (similar to the key-value search shown earlier). TheQueryOptionsManager.writeOptions() call saves the query options under the name constraints
.
By the way, because query options are typically set up by an experienced developer and used by other developers in applications, writing them requires a higher level of permissions. While we’ll show how to build query options in Java, you can also write query options as JSON or XML documents if you prefer.
Now we can use the query options to constrain the POJO properties where the search matches the phrases. The ConstrainedSearcher example specifies the constraints
query options when constructing the StringQueryDefinitionobject and then prefixes the Aretha Franklin
phrase with the artist
constraint and the Otis Redding
phrase with the writer
constraint.
StringQueryDefinition stringQry = queryMgr.newStringDefinition("constraints"); stringQry.setCriteria( "artist:\"Aretha Franklin\" AND writer:\"Otis Redding\""); SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle()); for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) { ... }
Apart from adding the query options and constraint prefixes, this example is unchanged from the previous version. The result output, however, is much more precise:
document: /topsongs/Aretha-Franklin+Respect.xml
location: /topSong/artist
matched: Aretha Franklin
location: /topSong/writers
matched: Otis Redding
Only one song had this combination of artist and writer, yielding our definitive answer.
From time to time, you might need to modify or inspect criteria programmatically. Examples include providing a GUI editor for search criteria, adding hidden criteria, checking for invalid or unauthorized criteria, or generating criteria to reflect the current state of an external resource.
As with query options, you use a builder to create a Java structure. The StructuredSearcher example builds a structured search for the same constrained criteria that the previous example expressed as a string.
StructuredQueryBuilder structureBldr = queryMgr.newStructuredQueryBuilder("constraints"); StructuredQueryDefinition structuredQry = structureBldr.and( structureBldr.elementConstraint("artist", structureBldr.term("Aretha Franklin")), structureBldr.elementConstraint("writer", structureBldr.term("Otis Redding"))); SearchHandle searchHandle = queryMgr.search(structuredQry, new SearchHandle()); for (MatchDocumentSummary docSum: searchHandle.getMatchResults()) { ... }
The example uses StructuredQueryBuilder to create a StructuredQueryDefinition specifying the criteria for the artist and writer constraints defined by the constraints query options. Aside from using StructuredQueryDefinition instead of StringQueryDefinition, this example is the same as the previous example, qualifies the same documents, and produces the same output. A Java program, however, could easily change one of the terms or add new complex Boolean conditions without string parsing.
If you prefer, you can also write a structured query as a JSON or XML document. While the rest of the tutorial will stick with string queries for consistency, in each case, the search criteria could have been specified with a structured query.
So far, the examples have answered specific questions. To help frame questions, it’s also useful to get a broad overview of the dataset. Facet analysis meets that requirement by performing counts or other aggregates on the entire dataset or a subset of interest. The next example supports facet analysis by genre or over time.
When you imported the package at the start of this tutorial, the import action configured the top songs database. The configuration created range indexes on the genre and week elements. A range index provides a basis for calculating facets. Now, we’re ready to take advantage of those genre and week range indexes.
As with the artist and writer indexes in a previous example, the FacettedSearcher example creates constraints for the genre and week indexes in query options. The constraints identify the range indexes and their datatypes. The example sorts the genres in descending order by number of songs in the genre.
optHandle.withConstraints( optBldr.constraint("genre", optBldr.range( optBldr.elementRangeIndex( new QName("genre"), optBldr.stringRangeType( "https://marklogic.com/collation/")), Facets.FACETED, FragmentScope.DOCUMENTS, null, "frequency-order", "descending")), optBldr.constraint("week", optBldr.range( optBldr.elementRangeIndex( new QName("week"), optBldr.rangeType("xs:date"))))); optHandle.setReturnResults(false); optMgr.writeOptions("facetsongs", optHandle);
The source code fragment skips over the construction of the QueryOptionsBuilder and QueryOptionsHandle builder, which remains the same as the earlier example. The call to QueryOptionsHandle.setReturnResults() modifies searches to return just the facet analysis and not a page of search results.
The facetsongs query options have done the heavy lifting of defining the facets. The FacettedSearcher example specifies the facetsongs query options when constructing the string definition. The example performs the facet analysis on the subset of the songs that contain the Grammy
term anywhere in the document. A search could use complex Booleans for a smaller subset or no criteria for the entire dataset.
StringQueryDefinition stringQry = queryMgr.newStringDefinition("facetsongs"); stringQry.setCriteria("Grammy"); SearchHandle searchHandle = queryMgr.search(stringQry, new SearchHandle()); for (FacetResult facet: searchHandle.getFacetResults()) { System.out.println("facet: "+facet.getName()); for (FacetValue value: facet.getFacetValues()) { System.out.println(" "+value.getLabel()+" = "+value.getCount()); } }
As with search results, SearchHandle parses the list of facets into a Java structure with the values and their aggregate counts. You can also read facets as JSON or XML.
The example output analyzes the genres and weeks for all songs having the Grammy
term.
facet: genre
Pop = 79
R&B = 71
…
Rhythm And Blues = 2
…
facet: week
1940-07-27 = 1
1940-08-03 = 1
1940-08-10 = 1
…
The output shows that consolidating genre values like R&B
and Rhythm And Blues
would improve the quality of the dataset. That’s fine and to be expected from real-world Big Data. Cleaning up those blemishes won’t change the big picture, so we can get value from our dataset immediately. If later applications could benefit from fixing these flaws, the facet analysis has shown us what to fix. We can refine the dataset in place without getting in the way of existing applications. Such flexible, progressive refinement differs from traditional databases, where changes to data structures and associations have a disruptive impact on applications.
For some purposes, facet analysis provides too much detail. To get a fast summary of a dataset, you might want to aggregate ranges of values and eliminate outliers.
Query options can limit the number of facet values. When facet values are ordered by descending frequency, the effect is to return the top values. Query options can also define buckets for grouping facet values. The BuckettedSearcherexample refines the previous query options to add a limit and buckets:
optHandle.withConstraints( optBldr.constraint("genre", optBldr.range( optBldr.elementRangeIndex( new QName("genre"), optBldr.stringRangeType( "https://marklogic.com/collation/")), Facets.FACETED, FragmentScope.DOCUMENTS, null, "frequency-order", "descending", "limit=10")), optBldr.constraint("week", optBldr.range( optBldr.elementRangeIndex( new QName("week"), optBldr.rangeType("xs:date")), Facets.FACETED, FragmentScope.DOCUMENTS, optBldr.buckets( optBldr.bucket("1940s", "40s", "1940-01-01", "1950-01-01"), optBldr.bucket("1950s", "50s", "1950-01-01", "1960-01-01"), ..., optBldr.bucket("2000s", "00s", "2000-01-01", "2010-01-01") ))));
Other than referring to the revised query options, the BuckettedSearcher example has exactly same search code as the previous example. Because of the query options changes, however, the example produces only the top genres and groups songs by decade instead of by week.
facet: genre
Pop = 79
R&B = 71
…
Country = 8
facet: week
40s = 4
50s = 11
…
00s = 67
The broad understanding of the dimensions of the dataset gained through facet analysis can frame the investigation of specific questions. Knowing the genres for the song dataset suggest that, if we want to investigate the breadth of Quincy Jones career, we could look at the genres for the songs he has produced. Such questions can be answered quickly based on a range index.
First, the ValuesLister example defines a producer constraint (much like the artist and writer constraints in a previous example). The query options also identify the range index supplying the list of values (in this case, the genre values).
optHandle.withConstraints( optBldr.constraint("producer", optBldr.elementQuery(new QName("producer")) )); optHandle.withValues( optBldr.values("genre", optBldr.range( optBldr.elementRangeIndex( new QName("genre"), optBldr.stringRangeType( "https://marklogic.com/collation/" ))))); optMgr.writeOptions("valuesongs", optHandle);
To query for the values, the ValuesLister example constructs a ValuesDefinition with the name of the values list (genre
) specified in the query options (valuesongs
). The example also constructs a StringQueryDefinition, prefixesQuincy Jones
with the producer
constraint (as with Aretha Franklin
and the artist
constraint previously), and initializes the ValuesDefinition with the StringQueryDefinition to constrain the values list to the songs produced by Quincy Jones.
ValuesDefinition valdef = queryMgr.newValuesDefinition("genre", "valuesongs"); StringQueryDefinition stringQry = queryMgr.newStringDefinition(); stringQry.setCriteria("producer:\"Quincy Jones\""); valdef.setQueryDefinition(stringQry); ValuesHandle genreHandle = queryMgr.values(valdef, new ValuesHandle()); for (CountedDistinctValue value: genreHandle.getValues()) { System.out.println( " "+value.getCount()+" "+value.get("xs:string", String.class)); }
The call to QueryManager.values() reads the index and ValuesHandle parses the list into a Java structure reflecting the values for the constrained subset. That’s similar to the search() method with a SearchHandle in previous examples, but in this case, reading directly from the index. As elsewhere, you can also get the values list as JSON or XML. The example iterates over the list to get each count and value.
The output shows that Quincy Jones has produced a surprising diversity of hit songs:
Hit songs per genre for producer Quincy Jones:
1 Country Soul
1 Dance
…
1 Glam Metal
2 Hard Rock
1 Jazz
…
1 West Coast Hip Hop
A top song is a hit in one or more weeks and can be classified in one or more genres; thus, each top song associates weeks with genres. These associations of weeks and genres (called co-occurrence or, when read from the database, tuples) can demonstrate trends over time for genres. For instance, we can investigating the trend for songs produced by Quincy Jones.
In the query options, the producer constraint remains the same as the previous example (and so isn’t included in the fragment below). The TuplesLister example builds the week-genre
tuple list over the week and genre range indexes (instead of a values list for one range index).
optHandle.withTuples( optBldr.tuples("week-genre", optBldr.tupleSources( optBldr.range( optBldr.elementRangeIndex( new QName("week"), optBldr.rangeType("xs:date"))), optBldr.range( optBldr.elementRangeIndex( new QName("genre"), optBldr.stringRangeType( "https://marklogic.com/collation/")))))); optMgr.writeOptions("tuplesongs", optHandle);
To query for the tuples, the TuplesLister example constructs a ValuesDefinition with the name of the tuples list (weeks-genre
) specified in the query options (tuplesongs
). The example constrains the query to songs produced by Quincy Jones with the same StringQueryDefinition as the previous example (and so doesn’t include those statements in the fragment below).
ValuesDefinition valdef = queryMgr.newValuesDefinition("week-genre", "tuplesongs"); ... valdef.setQueryDefinition(stringQry); DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd"); TuplesHandle tuplesHandle = queryMgr.tuples(valdef, new TuplesHandle()); for (Tuple tuple: tuplesHandle.getTuples()) { System.out.print(" "+tuple.getCount()+" "); for (TypedDistinctValue value: tuple.getValues()) { String type = value.getType(); if ("xs:date".equals(type)) { System.out.print(dateFormat.format( value.get(Calendar.class).getTime())); } else if ("xs:string".equals(type)) { System.out.print(value.get(String.class)); } } System.out.println(); }
The call to QueryManager.tuples() reads the indexes and TuplesHandle parses the tuples into a Java structure reflecting the values for the constrained subset. The example iterates over the tuples to get each value, formatting the date values for weeks using a Java DateFormat.
The output satisfies the goal of the investigation by showing that Quincy Jones started by producing Country Soul / R&B songs and transitioned through other genres to Hip Hop.
Hit song genres by week for producer Quincy Jones:
1 1962-06-02 Country Soul
1 1962-06-02 R&B
…
1 1996-07-13 West Coast Hip Hop
1 1996-07-20 West Coast Hip Hop
By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.