Last year on the developer mailing list, David Sewell asked:
To paraphrase Euclid, I'm guessing there's no royal road to auto-applying XSLT to a document at load time into the database?
This article is meant to provide a shortcut for implementing exactly this use case. (I'm responding a little late to the original post, but perhaps this will be of help to others of you.) It also provides a simple tutorial for getting CPF up and running. Even if you don't have a need for this right now, I encourage you to work through the tutorial.
MarkLogic customers often have a need to perform a transformation of a document on load. Information Studio provides the easiest way to do this. Whether you use the browser-based UI or the infostudio API, you can create a flow, set up a transformation step, and load documents. There are a number of built-in transformation steps you can choose from, as shown in the UI's dialog. You can also perform a custom transformation using arbitrary XSLT or XQuery code. XSLT is often the preferred method, since the domain of transformations (being the "T" in "XSLT") is where XSLT shines:
So if you're exclusively using InfoStudio, there's your answer: use InfoStudio. No need to read any further.
But what if you want a document to be transformed no matter how you update it? For example, let's say you already have various workflows for loading documents, and Information Studio is only one of them. Other times they're inserted directly using xdmp:document-insert(), loaded using XQSync, or dragged-and-dropped using WebDAV. An InfoStudio transformation will only get applied if the document is loaded by InfoStudio. Is there a way to ensure that a custom transformation gets applied regardless of how you load a document? Yes. You just have to use a different framework that comes with MarkLogic: the Content Processing Framework (CPF).
CPF is a powerful, flexible framework for managing document state changes. It provides mechanisms for running arbitrary code triggered by various state changes such as documents being created, updated, and deleted. Given such flexibility, you have the power to define quite complex pipelines of transformations based on state changes. But often you don't need a lot of complexity. What if all you want is to apply an XSLT transformation whenever a document gets added or changed? Do you have to learn about all the ins and outs of CPF in this case? Wouldn't it be nice to just follow a quick recipe and save learning about CPF for another day? That's what this article is for. (And if you learn something about CPF along the way, so be it.)
Okay, let's get started. Below are the overall steps that you'll need to take. I'll walk you through each of these to make them go as fast as possible:
- Install CPF.
- Define the CPF domain.
- Write and load your XSLT.
- Write and load your CPF pipeline.
- Attach the pipeline to the domain you configured.
- Load documents and see them automatically get transformed.
- Go to the admin interface (http://localhost:8001) and navigate to the configuration page for your database (I'll be using the "Documents" database in this example):
- Check to see if a "triggers database" (where CPF will get installed) has been selected for your database. If one has already been selected, then you can skip this step. But if it says "(none)", then select the "Triggers" database (one that has been pre-defined for this purpose) and then click "ok":
- Navigate to the "Content Processing" menu for your database:
- Install CPF into your database by clicking the "Install" tab as instructed:
- On the next screen, choose "false" for the "enable conversion" option and click "install":
Confirm by clicking "ok":
Define the CPF domain
Here you'll specify which documents your on-load XSLT transformation will be applied to. These are identified using a domain.
- Navigate to the "Default [your-db-name]" menu item in the admin interface, under Content Processing->Domains:
- Here we are going to re-configure the default domain to narrow the scope of documents that get transformed on load. Change the values for the highlighted fields shown below. You can choose to define the domain of documents using a directory or collection (or just one document URI). In this case, every document in the "/docs-to-transform" database directory and all of its sub-directories will be a part of the applicable domain:
Write and load your XSLT
In this step you'll upload your XSLT script to the modules database that is configured for your CPF domain. If you didn't make any changes to the default "modules" field in your domain configuration (see above), that means you'll be loading it to the "Modules" database. First of all, let's assume your stylesheet is named onload.xsl, is located on your local filesystem, and looks like this:
The above script leaves the document unchanged except to insert a comment at the top. In practice, you'll want to do something more useful and specific to your application.
- For your convenience in following along in this tutorial, go ahead and save this onload.xsl file to your file system.
- Open up Query Console (on port 8000 in your browser). If MarkLogic is running on the same machine, you would navigate to http://localhost:8000/qconsole.
- Select the "Modules" database in the "Content Source" drop-down:
Copy and paste the following script into Query
- Edit the "/path/to/onload.xsl" to the location where you saved it on your local filesystem.
- First select "Text" and then click the "Run" button:
Assuming the query ran without error, your XSLT module is now ready to go.
Write and load your CPF pipeline
The following pipeline is configured to apply an XSLT stylesheet against a document in the applicable domain whenever that document is added or updated. The XSLT transform is only applied if the document is an XML document (as opposed to text or binary):
- For your convenience once again, you can grab this pipeline.xml file and save it to your filesystem.
- In the MarkLogic admin UI, navigate to the "Pipelines" menu item under "Content Processing":
- Click the "Load" tab at the top of the page:
- Enter the path to the directory on your filesystem where you saved the pipeline.xml file and click "ok":
- You should see your pipeline XML file listed. Confirm that you want to load it by clicking "ok":
- To confirm that the pipeline has been loaded, navigate to the name of your pipeline under "Content Processing"->"Pipelines":
Attach your pipeline to your domain
Now that you've configured both the domain (which set of documents you want automatically transformed) and the pipeline (the description of the transform itself), you need to associate the two with each other.
- Navigate to the "Pipelines" menu child of your domain ("Docs to transform") in the admin UI:
- Find your pipeline ("Apply XSLT transform on load"), click the checkbox next to it, and then click "ok":
- The pipeline will move up to the "attached" section, indicating that you've successfully attached it to this domain:
Load some documents and watch them get transformed
Everything is all set up now. The only thing that remains is to test it out.
- Back in Query Console, change the "Content Source" to your content database ("Documents" in our case):
Enter and run the following query. It inserts a new document into
the "/docs-to-transform" database directory, which means it should
automatically get transformed:
- Now let's look at the document by running this query:
- You should see that the "EVAN WAS HERE" comment was added to the top of the document:
- Now take a look inside this document's properties. Here you'll see that this is where CPF manages your document's state. If something went wrong when it tried to apply the XSLT pipeline, then you would see the error information here. Run the following query:
Now you have a cheat sheet for setting up an auto-applied XSLT stylesheet. If you've made it this far, then you also have some basic familiarity with CPF concepts like "domain" and "pipeline," which should put you in good stead should you decide to dig deeper into using CPF.