[MarkLogic Dev General] search:parse parenthetical grouping

Will Thompson wthompson at jonesmcclure.com
Sat Aug 11 12:35:12 PDT 2012


Yeah, unfortunately it will be difficult to make users aware of this. It seems like the best workaround for now is to regex the querystring before parsing, and convert any tokens we can detect as numbers to a phrase. Then the parser leaves the parens alone.

replace($qs,
  '(^|\s)(\d{1,4}[a-z]?(\.\d{1,4})?(\(\d{1,2}\)))(\s|$)',
  '$1"$2"$5')

I usually try to avoid string manipulation before parsing because unexpected input can cause things to blow up, but this seems pretty safe.

-Will

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Danny Sokolsky
Sent: Friday, August 10, 2012 7:06 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping

It probably won't work for you, but one idea is to change the starter for grouping to have different delimiting chars.  For example, 2 parens:

<starter strength="30" apply="grouping" delimiter="))">((</starter>

It might be better than a space....

-Danny

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Friday, August 10, 2012 4:59 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping

That does seem undesirable. I was going to refer you to https://github.com/mblakele/xqysp but it doesn't do much better - unless you can get your users to quote the number?

import module namespace qe="com.blakeley.xqysp.query-eval"
 at "query-eval.xqy";

qe:parse('123.4(5)'),
qe:parse('"123.4(5)"')
=>
cts:and-query((cts:word-query("123", ("lang=en"), 1), cts:word-query("4", ("lang=en"), 1), cts:word-query("5", ("lang=en"), 1)), ()) cts:word-query("123.4(5)", ("lang=en"), 1)

You can pass that output to search:resolve(), with pretty much the same semantics as search:search.

-- Mike

On 10 Aug 2012, at 16:15 , Will Thompson wrote:

> I need to prevent paren grouping from happening when the parens are part of a string - typically it's a reference-type number. I can't think of a situation where this would be desirable anyway:
> 
> search:parse('123.4(5)')
> => cts:and-query((cts:word-query("123.4(5"), cts:word-query(")")))
> 
> If I change the grammar to require a space on either or both sides of the paren, then it will always break some legitimate grouping case like "(hello AND world)".
> 
> Is there any way to control these grammar options a little further? It would be easy if you could just use regexes in the grammar options, i.e.:
> 
> <starter strength="30" apply="grouping" 
> delimiter="(^|\s)/)">/(($|\s)</starter>
> 
> Thanks,
> 
> Will
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list