[XQZone General] passing sequences to xdmp:eval( )

Howard Katz howardk at fatdog.com
Wed Dec 7 09:17:45 PST 2005


I have a series of boolean-returning query tests that are stored as strings,
and I need to be able to apply these tests against a variety of node
sequences, getting back the nodes that passed. Since the queries are
strings, I'm using xdmp:eval(). This works ok, but I have to do a slight
workaround to satisfy eval(), and I'm concerned about performance  issues
when I scale up to run the tests against very large sequences, say on the
order of hundreds of thousands of nodes.

Assume that one of the stored tests I want to run is 'starts-with( ., "ca"
)'. If I was directly applying this test against the sequence, "(<a>cat</a>,
<b>dog</b>, <c>catalog</c>)" (ie, not eval'ing it), I could say:

    let $seq := ( <a>cat</a>, <b>dog</b>, <c>catalog</c> )
    return
    $seq[ starts-with( ., "ca" ) ]

and I'd get back the node sequence, ( <a>cat</a>, <c>catalog</c> ). So far
so good.

If the same query is now stored as a string and I'm using eval(), I'd
similarly like to be able to say:


   define function eval-node-test( $nodes as element()+, $test as xs:string
) as element()*
   {
        let $query := concat(  "define variable $seq as element() external
", "$seq[ ", $test, " ]" )
        return
        xdmp:eval( $query, ( xs:QName("seq"), $nodes ) )
   }

   let $nodes:= ( <a>cat</a>, <b>dog</b>, <c>catalog</c> )
   return

   eval-node-test( $nodes, "starts-with( ., 'ca' )" )


This won't work however because the $nodes argument passed in '(
xs:QName("seq"), $nodes )' can't be a sequence, only a singleton. This means
that in order to use eval(),

(1) I have to wrap the $nodes sequence in a temporary <temp-root/> wrapper,
and
(2) construct the last part of the query inside concat() as "$seq/* [ ",
$test, ... ", rather than "$seq [ ", $test ... ".

In other words,

1) To satisfy eval(), I have to hoist all my nodes into a temporarily
constructed super-element, and
2) dereference every one of them again inside my query

Since I potentially need to be able to run these tests against hundreds of
thousands of nodes, I'm concerned about performance. Is that concern
justified? And if is, is there a more efficient way of doing this? 

Howard 




More information about the General mailing list