Problem

Redact credit card numeric strings from JSON documents. The redacted values should be randomly generated numeric strings with the same length as the original, allowing that the source strings may be of different lengths. (Also applies to any other numeric strings.)

Solution

For sample data, I whipped up some JSON documents that looked like this:

{
  name: "Charlie",
  ccNum: "6011759130364395557"
}

The goal is to replace each ccNum with a string consisting of digits, while still having the same length. As I loaded them, I put them into a collection called “people”.

None of the built-in redaction functions did exactly what I wanted, so I wrote a custom one. Here’s the function that implements the redaction:

'use strict';

const MAX = Number.MAX_VALUE.toString().replace(/\d\.(\d+)e\+\d+/, "$1").length;

// Generate a string of random digits with the specified length
// Inspired by https://stackoverflow.com/questions/1349404/generate-random-string-characters-in-javascript
function generateNDigitStr(n) {
  let padding = '0'.repeat(MAX); // because .toString will drop trailing zeros
  let more = n; // how many more characters do we need
  let curr; // how many characters to get this iteration
  let result = '';
  while (more > 0) {
    curr = Math.min(MAX, more);
    result += (Math.random().toString(10)+padding).slice(2, curr+2);
    more -= curr;
  }
  return result;
}

function randomizeCC(node, options) {
  const builder = new NodeBuilder();
  builder.addText(generateNDigitStr(node.length));
  return builder.toNode();
}

exports.redact = randomizeCC;

Save that to a file, then load it into your modules database. You can do that from Query Console like this (adjust the filesystem path to match where you saved the file, and be sure you’ve selected your modules database):

declareUpdate();

xdmp.documentLoad(
  '/home/dcassel/tmp/randomizeCC.sjs',
  {uri: '/redaction/randomizeCC.sjs'}
);

Next we need to save the redaction rule. Run this in Query Console against your schemas database.

declareUpdate();

xdmp.documentInsert(
  '/rules/randomizeCC.json', 
  {
    "rule": {
      "description": "Replace credit card numbers with randomly-generated numbers",
      "path": "//ccNum",
      "method": { 
        "function": "redact",
        "module": '/redaction/randomizeCC.sjs'
      }
    }
  }, 
  { permissions: xdmp.defaultPermissions(),
    collections: ['custom-rules'] }
);

We can test the redaction using Query Console.

const jsearch = require('/MarkLogic/jsearch');
const rdt = require('/MarkLogic/redaction');

jsearch.collections('people').documents()
  .map(function (match) { 
      match.document = 
        fn.head(
          rdt.redact(fn.root(match.document), 'custom-rules')
        ).root;
      return match;
  }).result();

After verifying that it works, we can use this redaction function with MLCP, either exporting content or copying it directly to another database.

Discussion

Credit card numbers can vary in length, ranging from 15 to 28 digits: 6 or 8 for an Issue Identification Number, 8 to 19 for individual account identifier, and a check digit.

The goal is to replace the credit card number with a randomly-generated numeric string with the same length as the original. Obfuscating credit card numbers is often done by displaying something like “XXXX-XXXX-XXXX-1234”. That’s fine to put on a receipt, but most redaction use cases call for realistic data. For instance, a production database has real credit cards, but the accompanying development and test databases should have fake data. That fake data should be as realistic as possible to make tests run against the data valid. Changing the format from digits to Xs could cause false test failures or lead to code that accepts values that shouldn’t be allowed.

Note that not all strings of digits are valid credit card numbers. There is a formula for determining whether a credit number is valid, but that’s out of scope for this recipe.

MarkLogic provides several built-in functions for redaction; however, none of them quite match the goal in this case. For instance, we can’t use mask-random, because we don’t have fixed-length numeric strings.

The generateNDigitStr function takes a parameter indicating how long the randomly-generated string should be. Math.random generates a random number in the range 0.0 (inclusive) to 1.0 (exclusive). Calling .toString(10) on the result converts the number to a string of base-10 digits (we can generate hexadecimal with .toString(16)). The padding is added to account for any trailing zeros in the random number. The code runs through the loop enough times to produce the requested number of digits, which will likely be once or twice. (I found MAX to be 16.)

Learn More

Advanced Security

Explore all the technical resources related to advanced security in MarkLogic, all in one place.

MarkLogic Security Course

This hands-on course will show you how to securely manage data inside the MarkLogic database.

Redacting Document Content

Read about MarkLogic’s redaction features that you can use when reading a document from the database.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.