Grokking the cts API

by Evan Lenz

As of the MarkLogic 5 release, the total number of built-in cts ("core text search") functions comes in at 217! Given how central the cts functions are for building applications on MarkLogic, I thought it would help to provide some pointers in navigating this potentially overwhelming API.

But first of all, if you're just getting started building a standard search application, you should start with the Search API (which uses and provides hooks into the cts API).

Having said that, here you go!

Machine generated alternative text: ctsa-everse-query-nodes ctselement-par-geospatial-boxes ctsurl-match ctsdircclory•qucry•dcpth cts element-geospatial-queq,-options ctsendty4iighlight cts:cIassfy cts.element-word-query-text ctseeme,t-query-eement-name cts:not-query-query cement-dd-geoatê1-vakie-mtdi ctselement-vaIue-query-eIemeflt-name ctseIe.nent-páieopatIaI-vaIue-match ctseleiient-vaIueranges cts:elernent-attribute-word-niatch cts:or—query cts:and-qucry-qucrios ct-quy-tns cts:and-not-query ctaeIe.nen-attr-p*-geoipatIai-vaIue-in.td ctseIement-atthbute-vahiequery-options cts elemen-attributevaIue-query-wegtiI cts:register ctseIementworo-c,eryweIght cts:element-query cts:element-attribute-value-query-text cts:score cts:element-value-co-occurrences ctelement-geopaIiaI-boxes cts:not-query cts:element-attribute-range-query cts:document-query ctsfield-word-query cts eIeinem-geospatiak,ery-e1ement-naine cts:rcgstcrcdquery.optiŒ1s cts:element-value-match cts:elern e rit - a t tribute-value-query ctseIement-attrute-word-query-text a I ctdrcctoryquts:p0Iyg01’ cts:point cts:circle cts: liii C I cts:eement-geospatIaI-query ctspolygon-ver(kes cts:near-query cts:word-query-welght cts:element-words cts:element-attribute-values cts:hiIiq cts:quahty  ctsnot-query-weigiit cts:fitness ctLcoIIecdon-ery•urls cts:element-attribute-value-query-element-name cts.aid-not-query-positive-query cts: and - u e ry cts:f req ue n cy ç t s : v o r d S cts:element-attribute-value-match ctscontains ctemet-gespataI•vaues cts:distance cts3tsni cts:similar-query cts: e ment —va lues cts:element-word-que cts:element-range-query  ctselenient-atthute-wdiiery-aØons ry cts:remainder cts:collections cts:element-attribute-word-query cts:eIemern-vaue-qu ery -options cts: b O X cts:element-attribute—words cts:directory-query ctseIemnt-pa.-geospatia-vahies C t s: s e a r c cts:properties-query cts:uris cts:collection-query cement-Pafr-gopatby-enent-nwe cts:element-value-query-text cts:element-range-qoery-opeiator ctselement-attribute-palr-geospatial-query cts:element-word-match cts:and-nOI-query-negatlveipJery cement-attrfbite-psfr-geospetbl-query-dement-nanie cts e1nent.range-query-upt1ons cts.fle4d-word-query-text cts:eementattribute-pair-geospatIaI-query-optlons ctsor-query-queries ctsfield-word-match ctselemcnt-attributeword-querydemcnt-name C t s : w o r ci — a u e r cment-ctiIld-geoatIaI-query-e4ement-name emn-chdd-geospatwl-vabes Y cts:element-attribute-value-co-occurrences cts:eement-word-query-options cts:denien-,akie-query-welgM ctsnear-query-options cts:element-pair-geospatial-query ctsnear-query-queries I cts:thresholds cts:registered-query cts:word-query-text cts:word-match cts:similar-query-nodes cts:clemcnt-child-geospatial-query ctsncarquery-distance cts:clemcnt-attilbute-vakie-query-atti-lbute-nanw ct&5Ieflfltattributeran5Ue1yGperatO( cLs:eIement•cNIdgcospstIa1-query-optIons cts:element-attrb*e-range-qiiery-aIue tsnear-iery-weight CtssiflCetflh1S cts:etenent•attributc•range•query-weight cts:element-value-query cts:reverse-query cts:field—words cts:etfl1entchddgeospatlaIboxes cts:eleinent•palr-geospaIšaIqueiy-opdons paiiospataIvahies ctr&ement-rangeiiery-vaIue ctireglstered.query.welght cts:elemeflt-query-query cts:eement-attrIbute-range-query-optlons cts:e)ement-attr*ate-value-ranges ctseIenient-word-query-.mcnt ctssimauery-weiit ctwfie4d-wod-qiiery.weight ria rn C cts:coIIectionmatch ctsword-query-options  ctseIeent-eLiatkt-vaIue-match cts:deregister ctsfleld-word-qucry-optlons cts:train cts:contldence

Just kidding. While Wordles can be fun, they're not always very useful. (I generated the above based on each function's number of search hits on this website, so I suppose the result is somewhat interesting; just don't put too much stock in it.)

Let's take a tour through the cts API, using some categories I've chosen. We'll knock down all 217 functions, without necessarily explaining how they work. You'll want to refer to the cts API documentation for those details.

The following list summarizes my breakdown by category:

Now let's take a quick tour through each one.

Query execution (2)

The most important function of them all is cts:search, which is concerned with executing cts queries (we'll get to those next). A related and also important function is cts:contains which matches a given node sequence against a given cts query, returning true if it matches and false otherwise. Two down, 215 to go!

Query constructors (30)

MarkLogic extends the XPath data model with an object type called "cts:query", which is the super-type of a number of more specific cts:query sub-types. Queries can be composed together using the cts query constructor functions. They can then be executed by passing them to cts:search() or passed to other functions, such as lexicon calls or functions in other libraries, including search:search() and many others. Thirty of the cts functions are query constructors, i.e. functions you call to construct a cts:query value. All of these function names end in "-query". If you see a cts function whose name ends in "-query", you can be assured that it's a cts:query constructor.

Query constructors can be categorized into different kinds. I'm going to call them leaf, composite, and "special" (for lack of a better word).

Composite query constructors (9)

The composite query constructors build up new queries from other queries, whether leaf queries or other composite queries. Here they are broken down into a few sub-categories:

Leaf query constructors (17)

The leaf query constructors are for queries that can stand on their own, i.e. can be constructed without the help of another query constructor. The following list breaks them down into several categories, depending on what the query searches for (collection URIs, directories, words, values, etc.). I've marked some of the text with bold type to draw attention to the consistent naming conventions.

Object being searched

Leaf query constructors (17)

collection URIs

cts:collection-query

document URIs

cts:document-query

directories

cts:directory-query

words

cts:word-query

cts:element-word-query

cts:element-attribute-word-query

cts:field-word-query

values

cts:element-value-query

cts:element-attribute-value-query

cts:field-value-query

value ranges

cts:element-range-query

cts:element-attribute-range-query

cts:field-range-query

geospatial values

cts:element-geospatial-query

cts:element-child-geospatial-query

cts:element-pair-geospatial-query

cts:element-attribute-pair-geospatial-query

Another thing worth noticing about the word, value, and range queries above is that they have consistent ways of scoping queries: by element, by attribute, or by field. So we see a function for each pairing of scope (element, attribute, or field) and object (word, value, or range). We'll see something similar with the lexicon functions. Stay tuned.

Special query constructors (4)

While the functions below each return a cts:query value, they don't really fall into the above (leaf vs. composite) categories:

  • cts:query—constructs a cts:query from its XML representation
  • cts:registered-query—returns a previously registered query (using cts:register)
  • cts:reverse-query—returns a reverse query (for finding stored queries given a document, rather than stored documents given a query)
  • cts:similar-query—returns a query matching nodes similar to the given model nodes

Okay, only 185 functions to go. (I promise the pace will pick up soon.)

Query accessors (93)

The query accessor functions aren't very interesting at all—and there are 93 of them! They're accessors for the various components of a cts:query value. You can recognize them using this failsafe technique: if you see a cts function whose name includes the string "-query-", then it's just an accessor. Given that there are 30 query constructors and 93 query accessors, I suppose that means each query type has an average of 3.1 query accessors. An example would be cts:word-query and its three accessors: cts:word-query-options, cts:word-query-text, and cts:word-query-weight. See a pattern?

Lexicon functions (39)

Lexicon functions are much more interesting. Whereas cts queries are about efficiently finding documents, lexicon functions are about efficiently retrieving unique values (or words or URIs, etc.) from across a potentially large number of documents. They all require a particular index setting to be enabled. For "search," think cts:search. For "analytics," think lexicon functions.

Non-geospatial lexicons (24)

Below are the 24 non-geospatial lexicon and lexicon wildcard functions grouped by lexicon type. Note the consistent naming conventions (at the end of the function names).

Type of lexicon

Lexicon functions (15)

Wildcard functions (9)

collection URIs

cts:collections

cts:collection-match

document URIs

cts:uris

cts:uri-match

words

cts:words

cts:element-words

cts:element-attribute-words

cts:field-words

cts:word-match

cts:element-word-match

cts:element-attribute-word-match

cts:field-word-match

values

cts:element-values

cts:element-attribute-values

cts:field-values

cts:element-value-match

cts:element-attribute-value-match

cts:field-value-match

value ranges

cts:element-value-ranges

cts:element-attribute-value-ranges

cts:field-value-ranges

value co-occurrences

cts:element-value-co-occurrences

cts:element-attribute-value-co-occurrences

cts:field-value-co-occurrences

Below are the same 24 functions, this time grouped by the scope of the lexicon call. Again, notice the consistent naming conventions (at the beginning of the function name). Also, since it's convenient to do so, this table includes the database setting you need to have enabled for each type of lexicon call.

Scope

Lexicon functions (15)

Wildcard functions (9)

Database setting

Entire database

cts:collections

cts:uris

cts:words

cts:collection-match

cts:uri-match

cts:word-match

Collection lexicon

URI lexicon

Word lexicon

Specific element

cts:element-words

cts:element-values

cts:element-value-ranges

cts:element-value-co-occurrences

cts:element-word-match

cts:element-value-match

Element word lexicon

Element range index

"

"

Specific attribute

cts:element-attribute-words

cts:element-attribute-values

cts:element-attribute-value-ranges

cts:element-attribute-value-co-occurrences

cts:element-attribute-word-match

cts:element-attribute-value-match

Attribute word lexicon

Attribute range index

"

"

Specific field

cts:field-words

cts:field-values

cts:field-value-ranges

cts:field-value-co-occurrences

cts:field-word-match

cts:field-value-match

Field word lexicon

Field range index

"

"

The above table makes clear a rather un-intuitive fact: to retrieve values, you need a range index. If you have trouble finding "value lexicons" in the administrative interface, it's because range indexes are what you're looking for.

Geospatial lexicons (15)

Below are the 15 geospatial lexicon functions, grouped by type of lexicon:

Type of lexicon

Lexicon functions (11)

Wildcard functions (4)

geospatial values

cts:element-geospatial-values

cts:element-child-geospatial-values

cts:element-pair-geospatial-values

cts:element-attribute-pair-geospatial-values

cts:element-geospatial-value-match

cts:element-child-geospatial-value-match

cts:element-pair-geospatial-value-match

cts:element-attribute-pair-geospatial-value-match

geospatial boxes

cts:element-geospatial-boxes

cts:element-child-geospatial-boxes

cts:element-pair-geospatial-boxes

cts:element-attribute-pair-geospatial-boxes

geospatial co-occurrences

cts:geospatial-co-occurrences

cts:element-value-geospatial-co-occurrences

cts:element-attribute-value-geospatial-co-occurrences

We see something similar among the geospatial functions in how each type of lexicon call has a particular type—value (point), box, or co-occurrence. And also how each has a particular scope—element, element-child, element-pair, or element-attribute-pair. Which of these you use depends on how you chose to represent geospatial coordinates in your data.

Geospatial constructors, accessors, and methods (24)

In addition to cts:query, MarkLogic adds several additional data types, specific to geospatial processing. The "cts:region" type is the super-type of the 6 geospatial object types listed in the table below:

Constructors (6)

Accessors (12)

Methods (6)

cts:point

cts:point-latitude

cts:point-longitude

cts:circle

cts:circle-radius

cts:circle-center

cts:circle-intersects

cts:box

cts:box-south

cts:box-west

cts:box-north

cts:box-east

cts:box-intersects

cts:linestring

cts:linestring-vertices

cts:polygon

cts:polygon-vertices

cts:polygon-contains

cts:polygon-intersects

cts:complex-polygon

cts:complex-polygon-outer

cts:complex-polygon-inner

cts:complex-polygon-contains

cts:complex-polygon-intersects

As you can see, there's a naming convention here too. Each type has a like-named constructor function. And each accessor or method includes the applicable type ("box", "polygon", etc.) before the relevant behavior ("intersects", "vertices", etc.). The only reason I separated these into "accessors" and "methods" is that the former are all about retrieving components of the constructed value, and the latter are about performing some other operation (testing for containment or intersection). Either way, since XQuery is functional, not object-oriented, you need to pass in the relevant object as the first argument.

Where do you use cts:region values? Most commonly, you use them to construct geospatial queries. So first you construct a cts:region (using one or more of the above constructor functions). Then you construct a geospatial cts:query (using a geospatial query function such as cts:element-geospatial-query), passing it the cts:region(s) you constructed. Finally, you pass the query to cts:search to run a geospatial search, or to a lexicon function to perform some geospatial-related analytics.

Other geospatial functions (8)

We can now wrap up the geospatial functionality in the cts API by listing eight more functions that operate on geospatial values:

Okay, we're reaching the home stretch: only 21 more functions to go!

Functions for extended node properties (5)

The result of a call to cts:search() is a sequence of nodes that reside in your database. But these node references also contain some special properties (five, to be precise) that extend beyond the XPath data model. They're very handy for building search applications since they relate to things like search relevance:

Functions for the extended "frequency" property (4)

Just as the nodes resulting from a call to cts:search() contain special properties, so do the values that are returned by certain lexicon calls. Specifically, when you make a call to a value lexicon (which, as you may recall from above, requires a range index), the values in the resulting list each have a "frequency" property, which indicates how many times the particular value occurs. In addition to cts:frequency() accessor function, there are a few other numeric functions that make use of frequency-weighted numeric lexicon values:

Again, for these functions to be useful, you'll need to use them with value lexicon lookups.

Miscellaneous categories (12)

"Miscellaneous" is a popular category in my family's monthly budget, but I digress. I'll try to break down these last remaining functions into some sub-categories:

I'm not going to explain these (or fall on any swords defending their categorization). The important thing is that the cts API and its 217 functions look a lot less overwhelming to you now, right? There's a hidden wisdom to it all—an underlying logic, a latent brilliance, a method to the madness...sorry, got a little carried away there.

Conclusion

Congratulations, you made it through the whole tour! As a reward, here's a little code to look at. It's the query I ran to generate the data for the Wordle shown at the beginning of the article. And, yes, it does use the cts API:

for $func-name in cts:element-attribute-values(xs:QName("function"),
                                               xs:QName("fullname"))
where starts-with($func-name,"cts:")
return
  concat($func-name,":",xdmp:estimate(cts:search(collection(),$func-name)))

And if you're thinking to yourself that I must have a range index enabled on my database since I'm calling a value lexicon, you're right. Well done.

Comments

  • Manufacturers participating in the HomeKit ecosystem include Philips, Honeywell, iHome and Marvell. A broad line of devices for a 'smart' home is to be expected. <a href="http://homekit.com">HomeKit</a> is part of iOS 8, where it is simply referred to as "Home" or in Dutch 'Home'. Has become an icon appeared on iCloud.com this gives the impression that the system will be. Via the Web to operate