Search by Day of Week

Problem

You want to search on derived data, such as day-of-the-week (Monday, Tuesday, …), but your data only has a date (2017-06-21).

Solution

The solution is to use Template Driven Extraction (TDE) to put the derived information directly into the row or triple index. With that in mind, there are two parts to the solution: the template and the actual search.

My sample documents are game records from a chess site where I used to play (I lost this one):

{
  "Event": "Clan challenge", 
  "Date": "2010-06-22", 
  "EndDate": "2010.12.04", 
  "Round": "?", 
  "White": "dmcassel", 
  "Black": "ColeFord", 
  "WhiteRating": "1544", 
  "BlackRating": "1483", 
  "WhiteELO": "1544", 
  "BlackELO": "1483", 
  "Result": "0-1", 
  "GameId": "7540292"
}

I built a template that extracts key pieces of information into a row, including the derived day of the week. Here’s the code to insert the template.

var tde = require("/MarkLogic/tde.xqy");

var dowRowTemplate =
  xdmp.toJSON({
    "template": {
      "context": "/GameId",
      "collections": [ "source1" ],
      "rows": [
        {
          "schemaName": "chess",
          "viewName": "matches",
          "columns": [
            {
              "name": "dayOfWeek",
              "scalarType": "int",
              "val": "xdmp:weekday-from-date(../Date)"
            }
          ]
        }
      ]
    }
  });

tde.templateInsert(
  "/tde/dowRowTemplate.json",
  dowRowTemplate,
  xdmp.defaultPermissions(),
  ["TDE"]
)

And here’s the code to search for documents that have dates that fall on Tuesdays:

const op = require('/MarkLogic/optic');

const dow = 2; // 1=Monday, 7=Sunday

const docId = op.fragmentIdCol('docId'); 

op.fromView('chess', 'matches', null, docId)
  .where(op.eq(op.col('dayOfWeek'), dow))
  .offsetLimit(0, 10)
  .joinDoc('doc', docId)
  .select(['doc'])
  .result();

This code returns the documents that match the selected day of the week.

We can also use Optic to group the game by days of the week, providing counts of games played on each day.

const op = require('/MarkLogic/optic');

op.fromView('chess', 'matches')
  .groupBy('dayOfWeek', [op.count('dowCount', 'dayOfWeek')])
  .result();

The result of this one is a sequence of items, where each item has a dayOfWeek value (1-7) and a dowCount (day-of-week count), giving the count of how many games took place on each day.

As a requirement, Triple Index must be on (it’s on by default in MarkLogic 9+).

Discussion

There’s a lot to look at here. Your data will likely be a bit different, so you’ll need to adjust the template accordingly. See the TDE tutorial for more details on how to build a template. You’ll need to update the context and collections to identify your data, as well as the columns to populate in your schema.

The first thing to note is that, as with any application of TDE, we can put information into the indexes without having to modify the documents themselves. For some MarkLogic users, it is important that the data remain in its original format. While we often address this concern using the envelope pattern, TDE provides a no-touch way to accomplish something similar. More generally, we will often use a transform to construct business entities, then apply TDE for fine tuning of what is available in the indexes.

The template I’m using here is very small, extracting just one column, which provides the derived day of week. The template can be revised at any time to provide additional data. Doing so could let us, for instance, find matches played by a particular player as Black on a chosen day of the week. The Optic API is not restricted to working with data in the row index, so this is a question of whether we’d want to index other fields in the row index or use a range index.

The val specification for each column consists of XQuery code. Whatever derived value you want to push into the index, here’s where you do it. In the day-of-week case, xdmp:weekday-from-date takes a properly formatted date and returns a 1-7 value, where 1 = Monday and 7 = Sunday.

We can use tde.nodeDataExtract to run the template against a target document and ensure that it does what we want. After that, we insert the template with tde.templateInsert, which triggers applying the templates to documents that match the context, collections, and directories that are part of the template. Notice that I did not need to include a URI column in the schema—MarkLogic tracks the origin of the data internally. The op.fragmentIdCol call gives me access to this. I can choose whatever name I like for the fragment ID column, so long as it doesn’t conflict with other column names in the schema. It’s worth noting that this ID is an internal one, not the document’s URI. The ID is not intended to be shown to users; rather, it’s simply a connector between a document and a row.

This particular data set is very flat: all of the properties are direct children of the root. The TDE template needs to pick out a node, so I selected the GameId property. The template uses relative paths to access the other JSON properties, such as ../Date.

The first query shows a document search for matches that took place on Tuesday. Notice that the where clause is applied before the joinDoc. This is important to minimize the number of documents that need to be loaded. The code also uses offsetLimit to control the number of result rows (in this case, documents) that will be returned. After joining the rows to the documents from which they were generated, the select statement narrows the query to return just the document contents, rather than any of the columns in the row.

The second query shows a groupBy result. From this, we can find out how many matches were played on each day of the week. This query is run fully against the row index, with no need to load up the original documents. The result is seven rows (assuming matches took place on each day) with two columns: dayOfWeek and dowCount.

Written Tutorial

Problem

Solution

Discussion

Learn More

Stay on top of everything Marklogic.

This website uses cookies.