Day One Concepts and Terms

Documents and Databases

The basic unit of organization in MarkLogic is a Document. Here are examples in JSON and XML:

 {
    name: 'Asahi Draft Beer',
    brewer: {
        name: 'Asahi',
        country: 'Japan'
    },
    calories: 41,
    alcohol:  '5.21'
}

<beer>
    <name>Asahi Draft Beer</name>
    <brewer>
        <name>Asahi</name>
        <country>Japan</country>
    </brewer>
    <calories>41</calories>
    <alcohol>5.21</alcohol>
</beer>

The set of JSON keys, objects, and arrays, or XML elements and attributes you use in your documents is up to you. MarkLogic does not require adherence to any schemas.

MarkLogic also supports documents encoded in binary form or plain text as well. We refer to this encoding (JSON, XML, text, or binary) as the document’s Format.

URIs

A document’s Uniform Resource Identifier (URI) is a unique key that must be chosen when you insert a document into the database. Each document has a unique URI. You use this URI to retrieve or refer to the document later. Typically document URIs begin with a slash like/beer.

Beyond the URI, MarkLogic maintains some additional metadata associated with each document including properties, permissions, and quality.

(NB: You will also see use of the term Fragment. Unless you’ve specifically enabled a feature called “fragmenting”, a fragment is the same thing as a document, the basic unit of storage in MarkLogic. See this blog post for some additional explanation).

Organization

How does MarkLogic organize documents in the database? Logically, MarkLogic provides two concepts: Collections and Directories. You can think of collections as unordered sets. If you have a notion of tag as well, that may help. Collections can hold multiple documents and documents can belong to multiple collections.

Directories are similar in concept to the notion of directories or folders in file systems. They are hierarchical and membership is implicit based on the path syntax of URIs.

Beyond directories and collections, MarkLogic also provides role-based security and document-level permissions to help you organize your documents securely.

MarkLogic stores documents (and associated directories and collections) in logical structures called Databases. The on-disk storage for a database is organized in physical pieces called Forests and forests are, in term, broken up into smaller pieces, called Stands.

Interfaces

So MarkLogic stores and manages documents. But how can you get to them? There are three main interfaces to MarkLogic, from highest to lowest level: Java / Node.js, REST, and XQuery / Server-side JavaScript. (There are also community contributed interfaces for Python and PHP).

In order to support these interfaces, MarkLogic provides access through a variety of different App servers, each of which implements a specific networking protocol on a TCP port. MarkLogic provides a few App Servers out-of-the-box, but you will usually configure up one or more for use with your application.

HTTP App Servers

The most common of these are HTTP App servers, which map incoming HTTP requests to file paths on the server filesystem (or document URIs in an associated database). An HTTP App server listens for requests, then executes any XQuery code in the corresponding file (or document), and then sends a response.

This is much like what other Application servers do. Java Servlet containers do this for Java, Apache HTTPD (with mod_php for example) does it for PHP, and so on. In MarkLogic, the app server functionality is actually insideMarkLogic itself and the programming language is XQuery or JavaScript.

NB: Those coming from an understanding of Oracle and PL/SQL, may find it helpful to think of XQuery and Server-side JavaScript as MarkLogic’s Stored Procedures language. It is natively understood and, although, it is not the only way, it is the lowest-level, most-efficient way to write code against MarkLogic.

XQuery is a W3C standard functional programming language, designed to query and transform collections of structured and unstructured data, usually in the form of XML, text and other data formats. It’s a great language for a database and inside MarkLogic you’ll find a highly optimized and tuned XQuery interpreter.

Server-side JavaScript uses Google’s V8 JavaScript engine for performance and offers the same functionality as XQuery in MarkLogic databases.

Beyond the standard functions in the W3C spec and the standard JavaScript language, MarkLogic provides a large number of its own, covering:

HTTP requests and responses
Database create, read, update, detele (CRUD)
Full-text search, spelling, thesaurus
Transactions
File system access
Security
String manipulation
Date/time
JSON, XML, Binary formats
Math and cryptography functions
Configuration, monitoring, and administration
HTTP and SMTP clients
And much more…

If you choose to, you can script an entire application in XQuery or Server-side JavaScript the same way you might in PHP or Python. But, you don’t need to. You can connect to MarkLogic in ways that fit into your environment, without using XQuery or JavaScript yourself, at all.

Historically, XQuery has been the main (and for some time, only) programmatic interface to MarkLogic. As you’d expect, a lot of MarkLogic documentation and examples use XQuery. MarkLogic’s Server-side JavaScript is an alternative to using XQuery. Both offer similar functionality and flexibility to utilize existing developer skills.

For more details, see the API reference.

REST API Instances

MarkLogic provides a rich, extensible REST API. To use it, you configure a REST API instance, which is a specialized HTTP App Server. When you configure a REST API instance for your database, MarkLogic

Creates a separate, small database,
Installs a copy of the REST API XQuery implementation into that separate database, and
Configures an HTTP App Server to point to this copy for code execution against your database

You then point your client at the port of the HTTP app server. (See here for a video.)

Clients

MarkLogic provides a Java client and Node.js client that run on top of the REST API. The Java client exposes the underlying REST API via Java language idioms while the Node.js client exposes the underlying REST API via JavaScript idioms. There are also other community-contributed clients such as Python and PHP. Refer to MarkLogic’s Developer site under Tools then Connectors for a current list.

Other Interfaces

In addition to HTTP App Servers, MarkLogic provides the following App server types:

XDBC: Similar in concept to an RDBMS JDBC Server. It provides database access and adhoc XQuery code execution. MarkLogic provides Java and .Net XCC clients for XDBC. Note: For Java connections to MarkLogic, consider using the Java Client.

WebDAV servers support the WebDAV protocol to allow WebDAV clients to have read and write access (depending on the security configuration) to a database. A WebDAV server only accesses documents and directories in a database.

ODBC: This app server provides a standard ODBC interface for use with the MarkLogic-provided ODBC Driver. With it, you can issue SQL queries over relational-style data resident in MarkLogic.

Written Tutorial