[MarkLogic Dev General] Surprising behavior with text node construction

David Sewell dsewell at virginia.edu
Thu Mar 13 18:27:33 PST 2008


It seems to me that the real issue here is this function definition from 
snippet 1:

define function transform_dummy($element as element()) as text()
{
    "dummy"
}

I'm not sure why this is not throwing an error, as the data type of the 
return value (xs:string) is not the declared type of the return value.
Under Saxon, the equivalent function definition in XQuery 1.0 throws a 
static error: you MUST have

   text { "dummy" }

in the function declaration for it to run.

This may be an issue of differences between the May 2003 definition of 
XQuery and the current one. I can't tell for sure from looking at the 
2003 specs whether the fuction conversion rules allow a return value
of xs:string to be converted to text() automatically:

http://www.w3.org/TR/2003/WD-xquery-20030502/#id-function-calls

But it does seem that snippet 1 should either throw an error or
behave like snippet 2.

On Thu, 13 Mar 2008, Florentine, George wrote:

> I've run into an interesting behavior (optimization? bug?) in MarkLogic
> and wanted to see what others thought of this.
>
> Here's the background - we have some code that dynamically generates
> content by processing DITA topics. Depending upon the structure of the
> content it's possible that our XQuery code may process two sequential
> elements that would each return a text node from a function. What we see
> is that in this case, only one text node is returned and its value is
> the concatenation of the two string values separated by a single space
> character. This is somewhat in line with the 2003 spec
> (http://www.w3.org/TR/2003/WD-xquery-20030502/#doc-ComputedTextConstruct
> or, section 3.7.2.4), which states:
>
> ----
> The content expression of a text node constructor is processed as
> follows:
> 1. Atomization is applied to the value of the content expression,
> converting it to a sequence of atomic values.
> 2. If the result of atomization is an empty sequence, no text node is
> constructed. Otherwise, each atomic value in the atomized sequence is
> cast into a string.
> 3. The individual strings resulting from the previous step are merged
> into a single string by concatenating them with a single space character
> between each pair. The resulting string becomes the content of the
> constructed text node.
> -----
>
> So it appears that there's some optimization in the output generation of
> nodes such that two sequential text nodes are collapsed into one.
>
> Below is a concrete code example. If you run the 1st code snippet in CQ,
> the code generates the output <p>dummy dummy</p>, showing an example of
> two calls to a function that should return two text nodes but only
> returns one text node, with the return value of each call ("dummy")
> concatenated into a single text node with a space character separating
> the two.
>
> If you run the same code (2nd snippet) with the one change that the
> return value from the function transform_dummy returns an explicitly
> created text constructor the output is <p>dummydummy</p> (no space
> character). This is the behavior I was expecting and seems like the
> right behavior. Note that the return value in function signature for the
> transform_dummy() function is text() so I would assume that the
> xs:string "dummy" would be coerced into a text node and that a text node
> would be returned from this function in all cases.
>
> It seems bad that this behavior is different. I'd like to get other
> perspectives on this.
>
> Thx,
>
> G
> -------------------------------
>
> Code snippet 1 - no explicit text constructor in the function
> transform_dummy, returns <p>dummy dummy</p>
> -------------------------------
>
> define function transform_default_element($element as element()) as
> node()
> {
>    (: create a new element with the same name and attributes and
> recurse to travel the subtree. :)
>    element
>     {fn:node-name($element)}
>     {$element/@*,transform_template($element/node())}
> }
> define function transform_dummy($element as element()) as text()
> {
>   "dummy"
> }
> define function transform_element ( $element as element())  as node()*
> {
>    (: branch to more specialized functions based on the type of element
> :)
>    typeswitch ($element)
>        case element(dummy)
>            return transform_dummy($element)
>        default
>            return transform_default_element ($element)
> }
> define function transform_template ( $nodes as node()* )  as node()*
> {
>
>   for $node in $nodes
>   return
>       typeswitch($node)
>           case element()
>               return transform_element($node)
>            default
>                (: PIs, text and comment nodes are outputted here :)
>                return $node
> }
>
> (: module start :)
> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
> return transform_template($para/node())
>
> -----------------------------------------
> Code snippet 2: explicit creation of text node in transform_dummy,
> returns <p>dummydummy</p>
> ------------------------------------------
>
> define function transform_default_element($element as element()) as
> node()
> {
>    (: create a new element with the same name and attributes and
> recurse to travel the subtree. :)
>    element
>     {fn:node-name($element)}
>     {$element/@*,transform_template($element/node())}
> }
> define function transform_dummy($element as element()) as text()
> {
>   (: explicitly create a text node before returning :)
>   text { "dummy" }
> }
> define function transform_element ( $element as element())  as node()*
> {
>    (: branch to more specialized functions based on the type of element
> :)
>    typeswitch ($element)
>        case element(dummy)
>            return transform_dummy($element)
>        default
>            return transform_default_element ($element)
> }
> define function transform_template ( $nodes as node()* )  as node()*
> {
>
>   for $node in $nodes
>   return
>       typeswitch($node)
>           case element()
>               return transform_element($node)
>            default
>                (: PIs, text and comment nodes are outputted here :)
>                return $node
> }
>
> (: module start :)
>
> let $para := xdmp:unquote("<p><dummy/><dummy/></p>")
> return transform_template($para/node())
>
> ------------------------------------------------------------------------
> ---
> George Florentine
>
> George.Florentine at FlatironsSolutions.com
>  O:  303.542.2173
>  C:  303.669.8628
>  F:  303.544.0522
>  www.FlatironsSolutions.com
> An Inc. 500 Company
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/


More information about the General mailing list