[XQZone General] cts:google-element-query
Travis Raybold
travis at raybold.com
Thu Mar 2 14:32:06 PST 2006
Woohoo! I knew someone must have gone through all of this... That looks
like a great parser.
Thanks Ryan,
--Travis
Ryan Grimm wrote:
> I've actually posted a query parser on xqzone that does exactly this
> and more.
> You can find it at:
> http://xqzone.marklogic.com/svn/commons/trunk/search/query-xml/query-
> xml.xqy
>
> I also have an example of how to use it in subversion as well but it
> is a little outdated.
>
> But here is a quick example:
>
> stox:searchToXml("site:www.marklogic.com xquery",
> ("link", "site", "filetype"),
> ("+", "-"), ("AND", "NOT", "OR")
> )
>
> The first argument is the query to parse. The second argument is the
> "fields". An example of a field is google's site: and filetype:
> etc. The third argument tells what boolean operators are valid. The
> forth argument is what mode to put two terms into.
>
> The function returns an xml structure like:
> <search>
> <term field="site">www.marklogic.com</term>
> <term>xquery</term>
> </search>
>
> A search of: "xquery specification" w3c
> Returns:
> <search>
> <term>xquery specification</term>
> <term>w3c</term>
> </search>
>
> If you include things like negation: -xslt xquery specification
> Returns:
> <search>
> <term op="-">xslt</term>
> <term>xquery</term>
> <term>specification</term>
> </search>
>
> With using modes: xslt OR xquery
> Returns:
> <search>
> <term mode="OR">xslt</term>
> <term mode="OR">xquery</term>
> </search>
>
> The modes obviously have a flaw because you can't do any any grouping
> using parentheses, so in some cases you might not know what order
> terms should be combined when executing the search.
> However, the script does do a whole ton of the heavy lifting for
> you. If you have any questions please let me know.
>
> That was the quick tutorial but hope it helps.
> --Ryan
>
> On Mar 2, 2006, at 12:34 PM, Travis Raybold wrote:
>
>> that's perfect Danny, thanks!
>>
>> the NOT functionality should be relatively simple for me to add.
>>
>> --Travis
>>
>> Danny Sokolsky wrote:
>>
>>> Hi Travis,
>>>
>>> Here is a function that is a simplified version of some code submitted
>>> by our friends at O'Reilly. This does the first part of what you are
>>> looking for--turns quoted phrases into phrase terms, and then
>>> tokenizes
>>> the rest on spaces.
>>>
>>> You should be able to expand this technique to handle NOTs or anything
>>> else you want. The idea is to create a simple XML structure that
>>> contains all of your tokenized terms, like the following:
>>>
>>> hello goodbye "hello goodbye"
>>>
>>> => becomes this XML structure
>>>
>>> <tokens>
>>> <token>hello</token>
>>> <token>goodbye</token>
>>> <token>hello goodbye</token>
>>> </tokens>
>>> You can then use that to create your cts:query expression.
>>>
>>> Here is the function to get your query string and turn it into a
>>> simple
>>> xml structure:
>>>
>>> define function get-query-tokens($input as xs:string?) as element() {
>>> (: This parses the quotes to be exact matches.
>>> The idea for this comes from our friends at o'reilly
>>> /xqzone/search/trunk/search/query-xml/query-xml.xqy :)
>>> <tokens>{
>>> let $newInput := fn:string-join(
>>> (: check if there is more than one double-quotation mark. If there
>>> is, tokenize on the double-quotation mark ("), then change the
>>> spaces
>>> in the even tokens to the string "!+!". This will then allow later
>>> tokenization on spaces, so you can preserve quoted phrases as phrase
>>> searches (after re-replacing the "!+!" strings with spaces). :)
>>> if ( fn:count(fn:tokenize($input, '"')) > 2 )
>>> then ( for $i at $count in fn:tokenize($input, '"')
>>> return
>>> if ($count mod 2 = 0)
>>> then fn:replace($i, "\s+", "!+!")
>>> else $i
>>> )
>>> else ( $input ) , " ")
>>> let $tokenInput := fn:tokenize($newInput, "\s+")
>>>
>>> return (
>>> for $x in $tokenInput
>>> where $x ne ""
>>> return
>>> <token>{fn:replace($x, "!\+!", " ")}</token>)
>>> }</tokens>
>>> }
>>>
>>>
>>> Hope that helps get you on the right path.
>>> -Danny
>>>
>>> -----Original Message-----
>>> From: general-bounces at xqzone.marklogic.com
>>> [mailto:general-bounces at xqzone.marklogic.com] On Behalf Of Travis
>>> Raybold
>>> Sent: Thursday, March 02, 2006 11:45 AM
>>> To: general at xqzone.marklogic.com
>>> Subject: [XQZone General] cts:google-element-query
>>>
>>>
>>> I'm creating a search interface, and I'm looking to tokenize the
>>> search string, keep anything in quotes as a single term, default to
>>> AND for the
>>>
>>> rest of the terms, and be able to handle NOT before a term. I think
>>> I can work this out, but surely it must be a common type of
>>> functionality,
>>>
>>> and a sample would probably save me hours of trudging... does anyone
>>> have a sample they'd be willing to post of something similar to
>>> this? If not,
>>>
>>> I'll develop it and post it back here when I'm done.
>>>
>>> Somehow the server didn't recognize cts:google-element-query... ;)
>>>
>>> Thanks,
>>>
>>> --Travis
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at xqzone.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at xqzone.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>>
>>> .
>>>
>>>
>>
>> _______________________________________________
>> General mailing list
>> General at xqzone.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> General at xqzone.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
> .
>
More information about the General
mailing list