The Evils of eval
Most dynamic languages allow you to evaluate a string of code, for example,
Injection attacks are a type of security vulnerability when data supplied by a user is interpreted or executed in a malicious or unexpected way. SQL injection is one of the most common occurrences ("Little Bobby Tables" anyone?), but any code that is evaluated is susceptible.
For example, the following code from a naïve calculator application takes a mathematical expression and returns the answer.
This works great for expressions like
1 + 1 or even
Math.acos(3 * Math.PI). However, what if the user passed in
users.findByID(1234).creditAccount(9999999999, '£')? The
calculate() function would blindly execute these as well with potentially dire consequences. Even if a user does not know what specific functionality is available in the target evaluation context, it is not very difficult to guess or get up to no good with just the core language. To implement our calculator safely we should implement our own expression parser that can sanitize and validate inputs to make sure they are valid math expressions and not arbitrary code.
Evaluating Code in a Different Context in MarkLogic
MarkLogic provides built-in APIs to evaluate code. This is most useful as a means to run code in a context different than the request from which it was called, for example in a different transaction, as another user, or asynchronously on the task server.
This is useful in many ways:
- Query, update, or insert documents into another database, for example, to write a schema into the Modules database or move documents from a staging to a separate production database.
- Orchestrate multiple transactions in a single request. By default MarkLogic queues up all database updates and applies them atomically at the end of a request. If you need to store or view intermediate results, you'll need to execute those in a separate transaction.
- Run a query at a particular database timestamp. By specifying a an explicit timestamp to a query you can effectively get a consistent snapshot of the database, even across separate transactions.
Take a look at the options to
xdmp.eval() for other ways to affect the context of evaluated code.
xdmp.eval() is generally to be avoided. A better option is to use
xdmp.invoke() you specify a path to an existing module. Like
xdmp.eval(), you can use the
$vars argument to safely pass in dynamic parameters to the stored module. That's a much safer way to parametrize evaluated code than building strings to eval. However, unlike
xdmp.eval(), there's no chance that an invoked module will unsafely evaluate an input.
xdmp.invoke() uses the same set of context options that
xdmp.eval() uses, so you can invoke a module in a separate transaction or as a different user.
It's not always feasible or convenient to isolate your dynamic code into its own main module, though.
xdmp.invokeFunction() allows you to invoke any in-context function, even anonymous ones that you build on the fly. Think of it as a MarkLogic-enhanced version of
xdmp.invokeFunction() allows you to separate the concerns of what the function does from the context in which it's evaluated. This makes for cleaner code and easier testing.
Take, for example, the following trivial illustration. The
xdmp.transaction() function gives the ID of the current transaction. Because the
xdmp.invokeFunction() call specifies that the second call to
xdmp.transaction() be run in a separate transaction you'll get a different ID.
The first call returns the transaction assigned to the current request. The second, using
xdmp.invokeFunction() explicitly calls the
xdmp.transaction() function in a different transaction. Note the use of
xdmp.transaction sans parentheses.
xdmp.transaction() calls the
xdmp.transaction, no parens, is a reference to the function itself. The actual identifiers in the output below are not important. The fact that they're different because of the evaluation context is important.
applyAs() takes a function and the same options argument as
xdmp.invokeFunction() and returns a new function that behaves just like the input, but will be invoked in the context determined by the options. Thus, downstream consumers don't need to be aware that the function is being invoked in a different context and can call the function as if it were the original function. For example, the (contrived)
insert() function below takes a URI and string message, saves a document to the database, and returns a string.
myInsert() has the same "signature" as the insert function but hides its evaluation context, simplifiying usage, very similar to applying around advice in aspect-oriented programming.
This approach is a lot cleaner and has a clearer separation of the logic and the orchestration than something like the following:
To summarize, it's almost always a bad idea to eval strings of code. This leaves you open to injection attacks and makes code more difficult to read and write. Instead, use
xdmp.invokeFunction() that can be used to wrap existing functions, hiding the change of context from consumers.
Stay safe out there.