[XQZone General] cts:google-element-query

Travis Raybold travis at raybold.com
Thu Mar 2 14:32:06 PST 2006


Woohoo! I knew someone must have gone through all of this... That looks 
like a great parser.

Thanks Ryan,

--Travis

Ryan Grimm wrote:

> I've actually posted a query parser on xqzone that does exactly this  
> and more.
> You can find it at:
> http://xqzone.marklogic.com/svn/commons/trunk/search/query-xml/query- 
> xml.xqy
>
> I also have an example of how to use it in subversion as well but it 
> is  a little outdated.
>
> But here is a quick example:
>
> stox:searchToXml("site:www.marklogic.com xquery",
>         ("link", "site", "filetype"),
>         ("+", "-"), ("AND", "NOT", "OR")
> )
>
> The first argument is the query to parse.  The second argument is the  
> "fields".  An example of a field is google's site: and filetype: 
> etc.   The third argument tells what boolean operators are valid. The 
> forth  argument is what mode to put two terms into.
>
> The function returns an xml structure like:
> <search>
>     <term field="site">www.marklogic.com</term>
>     <term>xquery</term>
> </search>
>
> A search of: "xquery specification" w3c
> Returns:
> <search>
>     <term>xquery specification</term>
>     <term>w3c</term>
> </search>
>
> If you include things like negation: -xslt xquery specification
> Returns:
> <search>
>     <term op="-">xslt</term>
>     <term>xquery</term>
>     <term>specification</term>
> </search>
>
> With using modes: xslt OR xquery
> Returns:
> <search>
>     <term mode="OR">xslt</term>
>     <term mode="OR">xquery</term>
> </search>
>
> The modes obviously have a flaw because you can't do any any grouping  
> using parentheses, so in some cases you might not know what order 
> terms  should be combined when executing the search.
> However, the script does do a whole ton of the heavy lifting for 
> you.   If you have any questions   please let me know.
>
> That was the quick tutorial but hope it helps.
> --Ryan
>
> On Mar 2, 2006, at 12:34 PM, Travis Raybold wrote:
>
>> that's perfect Danny, thanks!
>>
>> the NOT functionality should be relatively simple for me to add.
>>
>> --Travis
>>
>> Danny Sokolsky wrote:
>>
>>> Hi Travis,
>>>
>>> Here is a function that is a simplified version of some code submitted
>>> by our friends at O'Reilly.  This does the first part of what you are
>>> looking for--turns quoted phrases into phrase terms, and then  
>>> tokenizes
>>> the rest on spaces.
>>>
>>> You should be able to expand this technique to handle NOTs or anything
>>> else you want.  The idea is to create a simple XML structure that
>>> contains all of your tokenized terms, like the following:
>>>
>>> hello goodbye "hello goodbye"
>>>
>>> => becomes this XML structure
>>>
>>> <tokens>
>>>  <token>hello</token>
>>>  <token>goodbye</token>
>>>  <token>hello goodbye</token>
>>> </tokens>
>>> You can then use that to create your cts:query expression.
>>>
>>> Here is the function to get your query string and turn it into a  
>>> simple
>>> xml structure:
>>>
>>> define function get-query-tokens($input as xs:string?) as element() {
>>> (: This parses the quotes to be exact matches.
>>>   The idea for this comes from our friends at o'reilly    
>>> /xqzone/search/trunk/search/query-xml/query-xml.xqy :)
>>> <tokens>{
>>> let $newInput := fn:string-join(
>>> (: check if there is more than one double-quotation mark.  If there  
>>> is,   tokenize on the double-quotation mark ("), then change the  
>>> spaces
>>>   in the even tokens to the string "!+!".  This will then allow later
>>>   tokenization on spaces, so you can preserve quoted phrases as phrase
>>>   searches (after re-replacing the "!+!" strings with spaces).  :)
>>>    if ( fn:count(fn:tokenize($input, '"')) > 2 )
>>>    then ( for $i at $count in fn:tokenize($input, '"')
>>>           return
>>>             if ($count mod 2 = 0)
>>>             then fn:replace($i, "\s+", "!+!")
>>>             else $i
>>>         )
>>>    else ( $input ) , " ")
>>> let $tokenInput := fn:tokenize($newInput, "\s+")
>>>
>>> return (
>>> for $x in $tokenInput
>>> where $x ne ""
>>> return
>>> <token>{fn:replace($x, "!\+!", " ")}</token>)
>>> }</tokens>
>>> }
>>>
>>>
>>> Hope that helps get you on the right path.
>>> -Danny
>>>
>>> -----Original Message-----
>>> From: general-bounces at xqzone.marklogic.com
>>> [mailto:general-bounces at xqzone.marklogic.com] On Behalf Of Travis
>>> Raybold
>>> Sent: Thursday, March 02, 2006 11:45 AM
>>> To: general at xqzone.marklogic.com
>>> Subject: [XQZone General] cts:google-element-query
>>>
>>>
>>> I'm creating a search interface, and I'm looking to tokenize the  
>>> search string, keep anything in quotes as a single term, default to  
>>> AND for the
>>>
>>> rest of the terms, and be able to handle NOT before a term. I think 
>>> I  can work this out, but surely it must be a common type of  
>>> functionality,
>>>
>>> and a sample would probably save me hours of trudging... does anyone
>>> have a sample they'd be willing to post of something similar to 
>>> this?  If not,
>>>
>>> I'll develop it and post it back here when I'm done.
>>>
>>> Somehow the server didn't recognize cts:google-element-query... ;)
>>>
>>> Thanks,
>>>
>>> --Travis
>>>
>>> _______________________________________________
>>> General mailing list
>>> General at xqzone.marklogic.com  
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General at xqzone.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>>
>>> .
>>>
>>>
>>
>> _______________________________________________
>> General mailing list
>> General at xqzone.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> General at xqzone.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
> .
>




More information about the General mailing list