[MarkLogic Dev General] tokenize returns empty string on "" pattern

Mary Holstege mary.holstege at marklogic.com
Fri Feb 13 13:11:07 PST 2009


On Fri, 13 Feb 2009 12:04:30 -0800, Eric Palmitesta  
<eric.palmitesta at utoronto.ca> wrote:

> declare variable $alphabet := tokenize('abcdefghijklmnopqrstuvwxyz', '');
>
> for $x in $alphabet
> return concat('[', $x, ']')
>
> <v:results v:warning="more than one node">
> [] [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k] [l] [m] [n] [o] [p] [q]  
> [r] [s] [t] [u] [v] [w] [x] [y] [z]
> </v:results>
>
> Why am I getting the empty string at the beginning of the sequence  
> returned from the above call to tokenize?  Is this result expected?
>
> Eric

Actually, the expected result is an error because the pattern
matches the empty string and this is disallowed.  If you used,
say, " *" instead you would see the error.

The reason you are getting that result (setting aside the bug
that you shouldn't be getting any result at all) is that the empty
pattern matches at every position including the first. Tokenize
gives you every string separated by a matching instance of the
pattern and stops when it gets to the end of the string.  Since
"" and "a" are separated by "", then you get the extra "".  You don't
get the extra "" at the end simply because the scan for matches
has already stopped by that point.

//Mary


More information about the General mailing list