[MarkLogic Dev General] tokenize returns empty string on ""
pattern
David Sewell
dsewell at virginia.edu
Fri Feb 13 13:45:47 PST 2009
Allowing '\B' as a boundary match is an XQuery/XPath syntax extension,
no? The query below ought to throw an error when the XQuery version is
declared as strict 1.0, but it doesn't. (Are the regex extensions
documented somewhere?)
David
On Fri, 13 Feb 2009, Aaron Redalen wrote:
> Eric, that's the correct behavior. Your pattern matches the empty
> string anywhere, including at the start of your string. You can get
> the behavior you're looking for by matching empty strings not at the
> beginning or end of a word, using the expression '\B'.
>
> declare variable $alphabet := tokenize('abcdefghijklmnopqrstuvwxyz', '\B');
>
> for $x in $alphabet
> return concat('[', $x, ']')
>
> => [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k] [l] [m] [n] [o] [p] [q] [r] [s] [t] [u] [v] [w] [x] [y] [z]
>
>
> Aaron Redalen
> Senior Consultant, Federal
> Mark Logic Corporation
> +1 240 688 7433 Phone
> aaron.redalen at marklogic.com
> www.marklogic.com
>
> Don't miss the XML event of the year! Join us for the Mark Logic User Conference, May 12-14, in beautiful San Francisco. Hear from keynote speakers James Surowiecki, best-selling author of "The Wisdom of Crowds" and Whit Andrews, top analyst from Gartner. REGISTER NOW for the early bird rate of $395.
> Attend the conference at no charge as a speaker! Submit a proposal for a breakout session on business applications, technical implementation, or best practices. Deadline is February 13th.
>
>
> -----Original Message-----
> From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Eric Palmitesta
> Sent: Friday, February 13, 2009 3:05 PM
> To: ML Developer Mailing List
> Subject: [MarkLogic Dev General] tokenize returns empty string on "" pattern
>
> declare variable $alphabet := tokenize('abcdefghijklmnopqrstuvwxyz', '');
>
> for $x in $alphabet
> return concat('[', $x, ']')
>
> <v:results v:warning="more than one node">
> [] [a] [b] [c] [d] [e] [f] [g] [h] [i] [j] [k] [l] [m] [n] [o] [p] [q]
> [r] [s] [t] [u] [v] [w] [x] [y] [z]
> </v:results>
>
> Why am I getting the empty string at the beginning of the sequence
> returned from the above call to tokenize? Is this result expected?
>
> Eric
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
More information about the General
mailing list