[XQZone General] cts:google-element-query

Danny Sokolsky dsokolsky at marklogic.com
Thu Mar 2 12:17:19 PST 2006


Hi Travis,

Here is a function that is a simplified version of some code submitted
by our friends at O'Reilly.  This does the first part of what you are
looking for--turns quoted phrases into phrase terms, and then tokenizes
the rest on spaces.

You should be able to expand this technique to handle NOTs or anything
else you want.  The idea is to create a simple XML structure that
contains all of your tokenized terms, like the following:

hello goodbye "hello goodbye"

=> becomes this XML structure

<tokens>
  <token>hello</token>
  <token>goodbye</token>
  <token>hello goodbye</token>
</tokens>  

You can then use that to create your cts:query expression.

Here is the function to get your query string and turn it into a simple
xml structure:

define function get-query-tokens($input as xs:string?) as element() {
(: This parses the quotes to be exact matches.
   The idea for this comes from our friends at o'reilly 
   /xqzone/search/trunk/search/query-xml/query-xml.xqy :)
<tokens>{
let $newInput := fn:string-join(
(: check if there is more than one double-quotation mark.  If there is, 
   tokenize on the double-quotation mark ("), then change the spaces
   in the even tokens to the string "!+!".  This will then allow later
   tokenization on spaces, so you can preserve quoted phrases as phrase
   searches (after re-replacing the "!+!" strings with spaces).  :)
    if ( fn:count(fn:tokenize($input, '"')) > 2 )
    then ( for $i at $count in fn:tokenize($input, '"')
           return
             if ($count mod 2 = 0)
             then fn:replace($i, "\s+", "!+!")
             else $i
         )
    else ( $input ) , " ")
let $tokenInput := fn:tokenize($newInput, "\s+")

return (
for $x in $tokenInput
where $x ne ""
return
<token>{fn:replace($x, "!\+!", " ")}</token>)
}</tokens>
}


Hope that helps get you on the right path.
-Danny

-----Original Message-----
From: general-bounces at xqzone.marklogic.com
[mailto:general-bounces at xqzone.marklogic.com] On Behalf Of Travis
Raybold
Sent: Thursday, March 02, 2006 11:45 AM
To: general at xqzone.marklogic.com
Subject: [XQZone General] cts:google-element-query


I'm creating a search interface, and I'm looking to tokenize the search 
string, keep anything in quotes as a single term, default to AND for the

rest of the terms, and be able to handle NOT before a term. I think I 
can work this out, but surely it must be a common type of functionality,

and a sample would probably save me hours of trudging... does anyone
have 
a sample they'd be willing to post of something similar to this? If not,

I'll develop it and post it back here when I'm done.

Somehow the server didn't recognize cts:google-element-query... ;)

Thanks,

--Travis

_______________________________________________
General mailing list
General at xqzone.marklogic.com http://xqzone.com/mailman/listinfo/general



More information about the General mailing list