[MarkLogic Dev General] Taming xdmp:eval() syntax

Michael Blakeley michael.blakeley at marklogic.com
Tue Jul 24 16:42:20 PDT 2007


David,

I think the developerworks article has the right idea, and the issues 
aren't much different than in SQL (or even Perl).

The best practice is to avoid xdmp:eval() and XCC ad-hoc queries. If you 
must compromise that ideal, then use parameterization via external 
variables. Don't eval user-supplied input.

-- Mike

David Sewell wrote:
> Michael,
> 
> Everyone in the XML world (well, us and the Saxon email list at least)
> seems to be talking about XPath injection attacks this week, referring
> to this new article:
> 
>   http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html
> 
> Obviously letting users provide the XPath you operate on has its
> dangers. Does MarkLogic have or might you develop some ML-specific
> guidelines on what coding practices lead to vulnerabilities? (Maybe they
> could be shared directly with customers to avoid giving people too many
> ideas?)
> 
> DS
> 
> On Wed, 18 Jul 2007, Michael Blakeley wrote:
> 
>> If I can hijack this conversation for a word or two on external variables....
>>
>> External variables are even lazier than string concatenation is. When you use
>> an external variable, you protect your code from injection attacks, and you
>> also avoid having to escape significant characters.
>>
>> Laziness is good.
>>
>> -- Mike
>>
>> Andy Townsend wrote:
>>> Hi David,
>>>
>>> I like your solution.  Whenever I have used xdmp:eval to date I have
>>> preformed a string literal to pass i rather than employ external variables -
>>> it feels slightly lazy but easy.
>>>
>>> The only downside I know to using that method to construct the string is if
>>> the string actually contains XQuery { } syntax that you _dont_ want
>>> evaluated until you execute the xdmp:eval - such as if you were setting a
>>> variable from the other DB and using it as a part of your evaluated query. A
>>> bit contrived but something like this.....
>>>
>>> let $query := <q>
>>>         let $localdbval := /abc/def/ghi[1]
>>>         myfunc(<anode>{$localdbval}</anode>)
>>> </q>
>>>
>>> where rather than remembering to escape quotes you'll have to escape braces,
>>> using double braces.
>>>
>>> Andy
>>>
>>>
>>>
>>>
>>>
>>> David Sewell <dsewell at virginia.edu> Sent by:
>>> general-bounces at developer.marklogic.com
>>> 18/07/2007 19:59
>>> Please respond to
>>> General Mark Logic Developer Discussion <general at developer.marklogic.com>
>>>
>>>
>>> To
>>> General XQZone Discussion <general at xqzone.marklogic.com>
>>> cc
>>>
>>> Subject
>>> [MarkLogic Dev General] Taming xdmp:eval() syntax
>>>
>>>
>>>
>>>
>>>
>>>
>>> We have been somewhat loathe to use xdmp:eval() because of its rather
>>> ungainly syntax, what with having to define external variables and use
>>> the <options><database>..</database></options> structure to pass a
>>> database ID to the function. I'd like feedback on an alternate way of
>>> calling the function that I've just been trying out.
>>>
>>> Suppose I have a database named "Shakespeare" containing hamlet.xml.  It
>>> is not the default database for my app server, so I'll need to access it
>>> using xdmp:eval().  I want to return all lines spoken by Hamlet containing
>>> the word "slave", then all lines containing "fool". (For purposes of the
>>> exercise I'll call cts:search twice rather than use an 'or' query.)
>>>
>>> Here's what I'd call the typical way to construct the query:
>>>
>>>   let $terms := ("slave", "fool")
>>>   for $search in $terms
>>>   return xdmp:eval(
>>>     '
>>>       define variable $SEARCH as xs:string external
>>>       cts:search(doc("hamlet.xml")//speech[speaker[.="HAMLET"]]/line,
>>> $SEARCH)
>>>     ',
>>>     (QName("", "SEARCH"), $search),
>>>     <options xmlns="xdmp:eval">
>>>       <database>{xdmp:database('Shakespeare')}</database>
>>>     </options>
>>>   )
>>>
>>> where the query string is passed to xdmp:eval() as a literal string.
>>> That does the job and is compact, but somewhat unreadable. Plus you have
>>> to be careful to escape any ' used within the query.
>>>
>>> Here is an alternate way of doing the same thing, longer but more
>>> readable, producing the exact same results:
>>>
>>>   define variable $ShakespeareDB
>>>   {
>>>     <options xmlns="xdmp:eval">
>>>       <database>{xdmp:database('Shakespeare')}</database>
>>>     </options>
>>>   }
>>>
>>>   let $terms := ( "slave", "fool" )
>>>   for $search in $terms
>>>   let $query :=
>>>     <q>
>>>       cts:search(doc("hamlet.xml")//speech[speaker[.='HAMLET']]/line,
>>> "{$search}")
>>>     </q>
>>>   return xdmp:eval(
>>>     $query,
>>>     (),
>>>     $ShakespeareDB
>>>   )
>>>
>>> Here I'm defining the options node as a global variable, which of course
>>> makes sense if I want to use the same options in more than one
>>> xdmp:eval(). The main novelty is passing the query parameter to
>>> xdmp:eval() as a variable containing a constructed <q> element, which
>>> is cast as a string by xdmp:eval(). Because $query is a constructed
>>> element, I can use standard XQuery { } syntax to embed variable
>>> references that are expanded before $query is passed to xdmp:eval(),
>>> so
>>>     "{$search}"   ==>    "slave" then "fool"
>>>
>>> I'm getting the effect of external variables within xdmp:eval() without
>>> the messiness of the external variables syntax.
>>>
>>> Can anyone see a downside to this approach?
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4532 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20070724/74ee332f/smime.bin


More information about the General mailing list