Native JSON

Native JSON brings all of MarkLogic’s production-proven indexing, data management, and security capabilities to the predominant data format of the web. Using native JSON you can build rich applications that span JSON, RDF, XML, binary, and text documents within a single environment without slow and brittle conversion between formats. Native JSON complements the upcoming server-side JavaScript environment by providing seamless access to JSON managed in a MarkLogic cluster. Because native JSON uses the same foundation that MarkLogic has developed and tuned over the last decade it is fast, stable, secure, and composable with existing data and code.

Important Concepts

Nodes

JSON documents are represented as hierarchies of nodes. Each node has a type and a value. For example, the following document is illustrated as a tree of nodes below.

{ 
 "name": "Oliver", 
 "scores": [88, 67, 73],
 "isActive": true, 
 "affiliation": null 
}

The JSON data model defines six data types. Each is represented as a node type in MarkLogic. Additionally, MarkLogic wraps JSON nodes in a Document node into and out of the database.

  • Object: Key-value pairs
  • Array: Ordered collections
  • Text: Character data
  • Number: Integer or decimal values
  • Boolean: true or false
  • Null: Absence of another type, but still defined
  • Document: Top-level wrapper for documents in the database.

JSON nodes share the same data model as XML in MarkLogic. This provides a unified way to manage and index documents of both types. For example a range index on an XML element, <name/> (no namespace), will also cover the JSON property name.

The following Server-Side JavaScript illustrates working with Nodes through strongly typed JavaScript or XPath.

var json = xdmp.unquote('{ 
  "name": "Oliver", 
  "scores": [88, 67, 73], 
  "isActive": true, 
  "affiliation": null 
}').next().value; // Returns a ValueIterator
json.root.name;                      // "Oliver" [text() in XQuery, CharData in JavaScript]
json.root.name == "Oliver";          // true
json.root.name === "Oliver";         // false
json.root.name instanceof CharData;  // true
// next returns "Oliver" [text()/CharData]
json.xpath("/self::document-node()/child::object-node()/child::text('name')");
json.root.xpath("/name");            // "Oliver" [text()/CharData]
json.root.xpath("/name/data(.)");    // "Oliver" [String]
json.root.name.valueOf();            // "Oliver" [String]
json.root.scores[1] instanceof Node; // true
json.root.scores[1].valueOf();       // 67 [Number]
json.toObject() instanceof Node;     // false
xdmp.toJSON(json.toObject());        // Round-trip in JavaScript with full fidelity

Compare and contrast that with a similar document represented in XML.

var xml = xdmp.unquote('<person><name>Oliver</name></person>').next().value;
xml.root.xpath("./name");  

Note that the equivalent of the <person> root element doesn't exist in the JSON representation

Nodes vs. Objects in JavaScript

All nodes in MarkLogic are immutable. In XQuery you can use the family of node update built-ins to, for example xdmp:node-replace(). These built-ins will work on JSON nodes as well. However, in JavaScript it is more natural to update an object in-place, for example person.name = "Peter";. You can use the . and [] operators to read child properties of a JSON node, but if you try to update a node you’ll get an error. In order to update a JSON node in JavaScript, first convert the node to an Object with the .toObject() function on the Node.prototype. Make your changes on the Object instance. You can go the other way and convert an Object to a Node with xdmp.toJSON(). However, most built-ins that expect JSON nodes will do this conversion for you automatically.

declareUpdate();
var person = cts.doc("/person1.json");
//person.name = "Peter";     // Error: You can't update a JSON Node in-place
var obj = person.toObject(); // Convert the Node to a plain old JavaScript Object
obj.name = "Peter";          // Update as you would any other Object instance
xdmp.documentInsert("/person1.json", obj);  // xdmp.documentInsert() automatically converts Objects to Nodes
Indexing

JSON indexing shares much of the same foundation with the existing XML indexes. A JSON property is roughly equivalent to an XML element from the indexer’s perspective. JSON strings share the same text indexing characteristics (tokenization, stemming, decompounding, etc.) as XML text() nodes. Path, range and field indexes all work on JSON documents.

  • Numbers, booleans and null nodes are indexed in their own type-specific indexes, not as text. For example, a cts.jsonPropertyValueQuery("show_number", 1286.00) will match a document containing { "show_number": 1286 }, unlike a cts.elementValueQuery().
  • For a property that is an array, each member in the array is considered a value of the property. For example, with {“a”: [1, 2]}, both json-property-value-query(“a”,1) and json-property-value-query(“a”,2) will match.
  • JSON documents do not support fragmentation.
  • Because JSON doesn’t allow you to express mixed content, indexing for JSON does not support phrase-throughs or phrase-arounds.
  • Also, because there’s no standard way to express the natural language of some text in JSON, there is not support for switching languages within a document. All JSON documents are indexed using the language by the database’s default setting.

1. JSON as a Document Format

JSON is now a document format, just like XML, TXT and BINARY. JSON documents support permissions, collections, and quality in the same way that is possible with other document formats. You can ingest JSON documents through mlcp or the XQuery APIs described in the next section (“2. CRUD with XQuery”). Here is a command line example to ingest JSON documents with mlcp:

./mlcp import -host localhost -port 8011 -username admin -password ******** -input_file_path /space2/jsondata/

Compressed archives are also supported, for example,

./mlcp import -host localhost -port 8011 -username admin -password admin -input_file_path /space2/jsondata-zipped/ -input_compressed true -input_compression_codec zip

1.1 New Node Types

Inside the server, similar to an XML document, a JSON document is stored as a “tree”. The following new node types are added to support JSON documents:

  • object node
  • array node
  • number node
  • boolean node
  • null node

(A JSON document can have “text nodes” as well.)

Suppose a JSON property with the name “foo” is an object and the object contains a property named “bar”. For all purposes of search, indexing and XPath, it will behave like an element “foo” with a child element “bar”.


1.2 New Node Constructor

MarkLogic 8 extends the XQuery/XPath data model to include new node constructors to support JSON data types. JSON strings are text() nodes, just like in XML.

  • object-node { }
  • array-node { }
  • null-node {}
  • number-node {}
  • boolean-node {}

Here is an example on how to construct a JSON object node with those constructors:

object-node { "p1" : "v1", "p2" :  array-node {1,2,3} , "p3" : fn:true(), "p4" : null-node {}  } => 
{ "p1" : "v1", "p2" :  [1, 2, 3] , "p3" : true, "p4" : null }.

2. CRUD with XQuery

2.1 Create

As discussed in Section 1, JSON documents can be ingested through mlcp. xdmp:document-load and xdmp:document-insert work with JSON documents too. Here are some examples:

Example 1:
xdmp:document-load("/tmp/foo.json")
Example 2:
let $node := object-node {"foo":"bar"}
return xdmp:document-insert("foo.json", $node)
Example 3:
let $node := xdmp:unquote('{"foo":"bar"}')
return xdmp:document-insert("foo.json", $node)
Example 4:
let $node := fn:doc("foo.json")
return xdmp:document-insert("bar.json", $node)

2.2 Update

xdmp:node-replace, xdmp:node-insert-child, xdmp:node-insert-before and xdmp:node-insert-after all work with JSON documents. Here are some examples:

Example 1:

The following script updates “foo.json” from {"a":{"b":"foo"}} to {"a":{"b":"bar"}}.

let $oldnode := fn:doc("foo.json")/a/b
let $newnode := text { "bar" } 
return xdmp:node-replace($oldnode, $newnode)
Example 2:

The following script updates “foo.json” from {"foo":["v1", "v2", "v3"], "bar":"v4"} to {"foo":["v1", "v5", "v3"], "bar":"v4"}.

let $oldnode := fn:doc("foo.json")/foo[2]
let $newnode := text { "v5" }
return xdmp:node-replace($oldnode,$newnode)
Example 3: xdmp:node-insert-child

The following script updates “foo.json” from {"a": {"b": "foo"}} to {"a": {"b": "foo", "c": "bar"}}.

let $parentnode := fn:doc("foo.json")/a
let $obj := object-node {"c":"bar"} 
return xdmp:node-insert-child($parentnode, $obj/c)
Example 4:

The following script updates “foo.json” from {"foo": ["v1", "v2", "v3"], "bar": "v4"} to {"foo": ["v1", "v2", "v3", "v5"], "bar":"v4"}.

let $parentnode := fn:doc("foo.json")/array-node("foo")
let $node := text  {"v5" }
return xdmp:node-insert-child($parentnode, $node)
Example 5: xdmp:node-insert-before

The following script updates “foo.json” from {"a": {"b": "foo"}} to {"a": {"c": "bar", "b":"foo"}}.

let $siblingnode := fn:doc("foo.json")/a/b
let $obj := object-node {"c":"bar"} 
return xdmp:node-insert-before($siblingnode, $obj/c)
Example 6:

The following script updates “foo.json” from {"foo": ["v1", "v2", "v3"], "bar": "v4"} to {"foo": ["v1", "v5", "v2", "v3"], "bar": "v4"}.

let $siblingnode := fn:doc("foo.json")/foo[2]
let $node := text {"v5" }
return xdmp:node-insert-before($siblingnode, $node)
Example 7: xdmp:node-insert-after

The following script updates “foo.json” from {"a": {"b": "foo"}} to {"a": {"b":"foo", "c": "bar"}}.

let $siblingnode := fn:doc("foo.json")/a/b
let $obj := object-node {"c":"bar"} 
return xdmp:node-insert-after($siblingnode, $obj/c)
Example 8:

The following script updates “foo.json” from {"foo":["v1", "v2", "v3"], "bar":"v4"} to {"foo":["v1", "v2", "v5", "v3"], "bar":"v4"}.

let $siblingnode := fn:doc("foo.json")/foo[2]
let $node := text { "v5" }
return xdmp:node-insert-after($siblingnode, $node)

These two built-ins are still available as helper functions but their signatures have changed (please see “7. Changes to Existing APIs” for more information).

Example 9: xdmp:to-json and xdmp:from-json
let $node := object-node {"a":"foo"}  
let $jsonobj := xdmp:from-json($node)
let $_ := map:put($jsonobj, "b", "bar")
return xdmp:to-json($jsonobj) => {"a": "foo", "b": "bar"}

2.3 Delete

Both xdmp:document-delete and xdmp:node-delete work with JSON documents. Here are some examples:

Example 1:
xdmp:document-delete("foo.json")
Example 2:
let $node := object-node { "p1":"v1", "p2": array-node { "v2", "v3" } }
return xdmp:document-insert("bar.json",$node);
let $node := fn:doc("bar.json")/array-node("p2")
return xdmp:node-delete($node)(: "bar.json" is now {"p1":"v1"}  : )

3. Indexing

There is much similarity between how a JSON document is indexed and how an XML document is indexed. A JSON property is roughly equivalent to an XML element. Also, for text nodes, the same indexing is done for a JSON document and an XML document.

But there are also some major differences:

  1. Numbers, booleans and nulls are indexed separately (not as text).
  2. For a property that is an array, each element in the array is considered a value of the property. For example, with {"a": [ 1, 2 ]}, both cts:json-property-value-query("a", 1) and cts:json-property-value-query("a", 2) will match.
  3. No fragmentation.
  4. No phrase-through or phrase-around.
  5. No switching languages within a document.

4. XPath over JSON

XPath works on JSON documents. Again, a JSON property is roughly equivalent to an XML element.

4.1 An Example

{
  "a":   {
     "b" : "v1",
     "c1" : 1,
     "c2" : 2,
     "d" : null,
      "e":  {
        "f" : true,
        "g" : [ "s1", "s2", "s3" ] 
      }
  }
}
/a/b =>  "v1"
/a/c1 => 1
/a/d => null
/a/e/f => true
/a/e/g => ("s1", "s2", "s3")
/a/e/g[2] => "s2"
/a[c1=1] => {"b":"v1", "c1":1, "c2":2, "d":null, "e":{"f":true, "g":["s1", "s2", "s3"]}}
/a[c1=3] => ()

4.2 Node Tests

The following node tests work with JSON documents:

  • object-node()
  • array-node()
  • number-node()
  • boolean-node()
  • null-node()
  • text()

With the $node in 4.1, here are some examples on how those node tests can be used:

$node//number-node()  =>  (1, 2)
$node/a/number-node()  =>  (1, 2)
$node//text()  => ("v1", "s1", "s2", "s3")
$node//object-node()  => (
{"a":{"b":"v1", "c1":1, "c2":2, "d":null, "e":{"f":true, "g":["s1", "s2", "s3"]}}}
{"b":"v1", "c1":1, "c2":2, "d":null, "e":{"f":true, "g":["s1", "s2", "s3"]}}
{"f":true, "g":["s1", "s2", "s3"]}
)

When accessed through XPath, an array is returned as a sequence by default. To return it as an array, use array-node(). Compare the following two examples:

$node/a/e/g => ("s1", "s2", "s3")   (:  the return here is a sequence : )
$node/a/e/array-node() => ["s1", "s2", "s3"]   (: the return here is an array : )

Also note that all the node tests above (including text()) can take an optional parameter (as xs:string) that specifies a property name. For example,

$node/a/number-node("c1") => 1

This feature is especially useful when there are spaces in a property name. For example:

let $node := object-node {"fo o" : "v1", "bar" : "v2"}
return $node/text("fo o") => "v1"
let $node := object-node {"fo o" : "v1", "bar" : "v2"}
return $node/text("foo") => ()

4.3 Unnamed Nodes

The name of a node in a JSON document is the name of the innermost enclosing property. For example,

let $node := object-node {"foo" : "v1", "bar" : array-node {1, 2} }
return fn:node-name($node/foo) => "foo"
let $node := object-node {"foo" : "v1", "bar" : array-node {1, 2} }
return fn:node-name($node/bar[2]) => "bar"
In some cases, a node in a JSON document might not have an enclosing property. Such a node is unnamed.  For example,
let $node := array  { 1, 2,  object-node { "foo" : 3 } }
return fn:node-name($node//number-node()[. eq 1]) => ()
let $node := xdmp:unquote('{"foo" : "v1"}')
return fn:node-name($node/object-node()) => ()

4.4 More on Arrays

Here are a couple of more advanced examples on accessing arrays with XPath.

Example 1:

In this example, one of the values in the array “bar” is an object, which has a property named “test”. To access “test”, the XPath doesn’t have to specify the “index” (into the array).

let $node := xdmp:unquote('{ "foo" : "v1", "bar" : [ "v2", {"test" : "v3"}, "v4" ] }')
return $node/bar/test
=> "v3"
Example 2:

In this example, the property “test” is enclosed in two levels of arrays but the XPath to access it is the same as that in Example 1.

let $node := xdmp:unquote('{ "foo" : "v1", "bar" : [ "v2", [true, {"test": "v3"}], "v4" ] }')
return $node/bar/test => "v3"

5. Queries

This section gives the list of queries that work for JSON. Queries in bold are new to MarkLogic 8. All examples in this section use the dataset below:

"1.json" ->  {"p1-1": "s1", "p1-2": 1, "p1-3": 2, "p1-4": null, "p1-5": {"p1-6": true, "p1-7": ["s2", "s3", "s4"]}, "": "none"}
"2.json" ->  {"p2-1": "s1", "p2-2": ["s2", "s3 s4 s5", "s6"], "p2-3": "s7"}
"3.json" ->  {"p3-1": "s1", "p3-2": {"p3-3": "s2 s3", "p3-4": ["s4", "s5", "s6"], "p3-5": "s7"}, "p3-6": "s8"}
"range-1.json" -> {"range1": 1, "range2": "2014-04-07T08:00:00", "foo": {"bar": "bar1", "pathrange1": "abc"}}
"range-2.json" -> {"range1": 2, "range2": "2014-04-07T09:00:00", "foo": {"bar": "bar2", "pathrange1": "def"}}
"range-3.json" -> {"range1": 3, "range2": "2014-04-07T10:00:00", "foo": {"bar": "bar3", "pathrange1": "ghi"}}
"range-4.json" -> {"range1": 4, "range2": "2014-04-07T11:00:00", "foo": {"bar": "bar4", "pathrange1": "jkl"}}
"range-5.json" -> {"range1": 5, "range2": "2014-04-07T12:00:00", "foo": {"bar": "bar5", "pathrange1": "mno"}}

And suppose the following range indexes are created:

  1. An element range index with type = double on “range1″
  2. An element range index with type = dateTime on “range2″
  3. A path range index with type = string on “/foo/pathrange1″

Note that you can use the same Admin GUIs that create element range indexes to set up range indexes for JSON properties. You need to leave the namespaces as empty (since JSON doesn’t use namespaces).

Example:
cts:search(fn:collection(),cts:word-query("s2")) -> this returns 1.json, 2.json and 3.json.
 
cts:json-property-word-query (similar to element-word-query)  and its accessors:
cts:json-property-word-query-property-name
cts:json-property-word-query-text
cts:json-property-word-query-options
cts:json-property-word-query-weight
Example:
cts:search(fn:collection(), cts:json-property-word-query("p1-7", "s4")) -> this returns 1.json.
cts:json-property-value-query (similar to element-value-query) and its accessors:
cts:json-property-value-query-property-name
cts:json-property-value-query-value
cts:json-property-value-query-options
cts:json-property-value-query-weight

Note that you can use this query for text, numbers, booleans or null. To query null, pass in the empty sequence as “value”.

cts:search(fn:collection(),cts:json-property-value-query("p1-1","s1")) -> this returns 1.json.
cts:search(fn:collection(), cts:json-property-value-query("p1-2", 1)) -> this returns 1.json.
cts:search(fn:collection(), cts:json-property-value-query("p1-4", ())) -> this queries null and returns 1.json.
cts:search(fn:collection(), cts:json-property-value-query("p1-6", fn:true())) -> this return 1.json.
cts:search(fn:collection(), cts:json-property-value-query("p2-2", "s3 s4 s5")) -> this returns 2.json.
cts:json-property-range-query (similar to element-range-query) and its accessors:
cts:json-property-range-query-property-name
cts:json-property-range-query-operator
cts:json-property-range-query-value
cts:json-property-range-query-options
cts:json-property-range-query-weight
Examples:
cts:search(fn:collection(),cts:json-property-range-query("range1",">=", 3)) ->this returns range-3.json, range-4.json and range-5.json
cts:search(fn:collection(),cts:json-property-range-query("range2",">", xs:dateTime("2014-04-07T10:00:00"))) ->  this returns range-4.json and range-5.json.
 
cts: path-range-query
Example:
cts:search(fn:collection(), cts:path-range-query("/foo/pathrange1", ">=", "ghi")-> this returns range-3.json, range-4.json and range-5.json
 
cts:json-property-scope-query (similar to element-query) and its accessors:
cts:json-property-scope-query-property-name
cts:json-property-scope-query-query

The name “element-query” has been confusing for many developers; hence the word “scope” is added into the JSON query name.

Example:
cts:search(fn:collection(), cts:json-property-scope-query("p3-2", cts:near-query(("s2","s7"),5))) -> this returns 3.json.
cts:directory-query
cts:collection-query
cts:document-query
cts:and-query
cts:or-query
cts:not-query
cts:and-not-query
cts:not-in-query
cts:near-query
cts:document-fragment-query
cts:locks-query
cts:similar-query
cts:distinctive-terms

6. Lexicon Functions

The following lexicon functions work for JSON in EA1:

cts:words
cts:word-match
cts:collections
cts:collection-match
cts:uris
cts:uri-match
cts:values
cts:value-match

There are two new lexicon functions added just for JSON:

cts:json-property-words(
   $property-names as string*,
   [$start as xs:string?],
   [$options as xs:string*],
   [$query as cts:query?],
   [$quality-weight as xs:double?],
   [$forest-ids as xs:unsignedLong*]
) as xs:string*

It is very similar to cts:element-words, but it takes a list of json property names instead of element QNames.

cts:json-property-word-match(
   $property-names as string*,
   $pattern as xs:string,
   [$options as xs:string*],
   [$query as cts:query?],
   [$quality-weight as xs:double?],
   [$forest-ids as xs:unsignedLong*]
) as xs:string*

It is very similar to cts:element-word-match, but it takes a list of JSON property names instead of element QNames.


  • xdmp:from-json, xdmp:to-json(), json:transform-from-json(), and json:transform-to-json() use JSON nodes rather than strings.

Comments

  • just a remark ... the difference between cts.doc() and fn.doc() is quite relevant here, as cts.doc() returns a node so .toObject() works while fn.doc() returns a ValueIterator ...