Problem

You’re loading documents, but the data have dates in a non-standard format. You want to fix them during ingest. Use this code with a MarkLogic Content Pump (mlcp) transform, a REST API ingest transform, or with Corb2.

Solution

Applies to MarkLogic versions 7+

Below you’ll find the code for a mlcp transform that will fix dates:

// Recurse through a JSON document, applying function f to any JSON property
// whose property name is in the array keys. 
function applyToProperty(obj, keys, f) {
  for (var i in obj) {
    if (!obj.hasOwnProperty(i)) {
      continue;
    }
    else if (typeof obj[i] === 'object') {
      applyToProperty(obj[i], keys, f);
    }
    else if (keys.indexOf(i) !== -1) {
      obj[i] = f.call(this, obj[i]);
    }
  }
}

function fixDate(value) {
  return new Date(value).toISOString();
}

// This is the MLCP transform function. Fix any date with the property 
// name(s) specified by context.transform_param. Property names must 
// be separated by a semicolon. 
function fixDateByProp(content, context)
{
  var propNames = (context.transform_param == undefined)
                 ? "UNDEFINED" : context.transform_param;
  propNames = propNames.split(';');

  var docType = xdmp.nodeKind(content.value);
  if (xdmp.nodeKind(content.value) == 'document' &&
      content.value.documentFormat == 'JSON') {
    // Convert input to mutable object and add new property
    var newDoc = content.value.toObject();
    applyToProperty(newDoc, propNames, fixDate);

    // Convert result back into a document
    content.value = xdmp.unquote(xdmp.quote(newDoc));
  }
  return content;
};

exports.fixDateByProp = fixDateByProp;
xquery version "1.0-ml";
module namespace dt = "https://marklogic.com/date-transform";

(: Dates coming in are of the format YYYY-MM-DD HH:MM:SS+FFFF :)
declare variable $format_in := "[Y0001]-[M01]-[D01] [H01]:[m01]:[s01]+[f0001]";

(: DateTime format to covert to :)
declare variable $format_out := "[Y0001]-[M01]-[D01]T[H01]:[m01]:[s01].[f1]";

declare function dt:change-dates($node, $qnames)
{
  typeswitch($node) 
  case element() 
    return 
      if (fn:node-name($node) = $qnames) then
        element { fn:node-name($node) } {
          (: replace the date with the desired format :)
          fn:format-dateTime(xs:dateTime(xdmp:parse-dateTime($format_in, $node/fn:string())), $format_out)
        }
      else
        element { fn:node-name($node) } { 
          $node/@*, 
          $node/node() ! dt:change-dates(., $qnames) 
        }
  case document-node()
    return document {
      dt:change-dates($node/node(), $qnames)
    }
  default 
    return $node
};

(: This is the MLCP transform function :)
declare function dt:transform(
 $content as map:map,
 $context as map:map
) as map:map*
{
  let $new-doc := 
    dt:change-dates(
      map:get($content, "value"), 
      fn:tokenize(map:get($context, "transform_param"), ";") ! xdmp:QName-from-key(.))
  let $_ := map:put($content, "value", $new-doc)
  return $content
};

Save this content in dateTransform.sjs or date-transform.xqy. Add it to your modules database, as described in the documentation.

Calling the transform with mlcp, on the other hand, looks like this:

$ mlcp.sh import -mode local -host srchost -port 8000 \
    -username user -password password \
    -input_file_path /space/mlcp-test/data \
    -transform_module /example/date-transform.sjs \
    -transform_param "date;pub-date"
$ mlcp.sh import -mode local -host srchost -port 8000 \
    -username user -password password \
    -input_file_path /space/mlcp-test/data \
    -transform_module /example/date-transform.xqy \
    -transform_namespace "https://marklogic.com/date-transform" \
    -transform_param "date;pub-date"

Discussion

The XQuery version assumes XML documents; the SJS version assumes JSON. While you can mix those up (XQuery/JSON, SJS/XML), it’s easiest this way.

MLCP allows a call to specify one string parameter that will be sent to a transform. These transform functions each use that parameter to specify where the target dates are. The parameter takes a semicolon-separated list of either JSON properties or XML QNames. QNames are specified using Clark notation, which can be generated using xdmp:key-from-QName(). In the code, the xdmp:QName-from-key() function converts them back. You can see an example of Clark notation in the example MLCP call above.

Both versions allow you to specify multiple names of properties or elements, and each instance of those will be found. For instance, if a JSON document has three “date” properties, then all three will have their values updated by the fixDate() function.

In the case of the XQuery version, notice the $format-in and $format-out variables. The $format-in pattern indicates the expected format that dates have in the input. $format-out simply reflects the standard date-time pattern.

This recipe follows a good practice for making code testable: the real work is done in functions that are called by the MLCP transform function. You can write unit tests for dt:change-dates() or for applyToProperty() and fixDate(). The function that MLCP actually calls just handles the inputs from MLCP and sends them to the functions that do the real work. Because it’s set up this way, you can use the same functions to support an REST API input transformation or a CoRB2 transform.

Learn More

mlcp User Guide

This guide describes how to install and use mlcp in MarkLogic.

Function to JSON property

Learn how to apply a JavaScript function to a property within a JSON document.

mlcp Technical Resources

Explore all the technical resources associated with the MarkLogic Content Pump (mlcp).

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.