[XQZone General] cts:google-element-query
Ryan Grimm
grimm at oreilly.com
Thu Mar 2 14:23:14 PST 2006
I've actually posted a query parser on xqzone that does exactly this
and more.
You can find it at:
http://xqzone.marklogic.com/svn/commons/trunk/search/query-xml/query-
xml.xqy
I also have an example of how to use it in subversion as well but it is
a little outdated.
But here is a quick example:
stox:searchToXml("site:www.marklogic.com xquery",
("link", "site", "filetype"),
("+", "-"), ("AND", "NOT", "OR")
)
The first argument is the query to parse. The second argument is the
"fields". An example of a field is google's site: and filetype: etc.
The third argument tells what boolean operators are valid. The forth
argument is what mode to put two terms into.
The function returns an xml structure like:
<search>
<term field="site">www.marklogic.com</term>
<term>xquery</term>
</search>
A search of: "xquery specification" w3c
Returns:
<search>
<term>xquery specification</term>
<term>w3c</term>
</search>
If you include things like negation: -xslt xquery specification
Returns:
<search>
<term op="-">xslt</term>
<term>xquery</term>
<term>specification</term>
</search>
With using modes: xslt OR xquery
Returns:
<search>
<term mode="OR">xslt</term>
<term mode="OR">xquery</term>
</search>
The modes obviously have a flaw because you can't do any any grouping
using parentheses, so in some cases you might not know what order terms
should be combined when executing the search.
However, the script does do a whole ton of the heavy lifting for you.
If you have any questions please let me know.
That was the quick tutorial but hope it helps.
--Ryan
On Mar 2, 2006, at 12:34 PM, Travis Raybold wrote:
> that's perfect Danny, thanks!
>
> the NOT functionality should be relatively simple for me to add.
>
> --Travis
>
> Danny Sokolsky wrote:
>
>> Hi Travis,
>>
>> Here is a function that is a simplified version of some code submitted
>> by our friends at O'Reilly. This does the first part of what you are
>> looking for--turns quoted phrases into phrase terms, and then
>> tokenizes
>> the rest on spaces.
>>
>> You should be able to expand this technique to handle NOTs or anything
>> else you want. The idea is to create a simple XML structure that
>> contains all of your tokenized terms, like the following:
>>
>> hello goodbye "hello goodbye"
>>
>> => becomes this XML structure
>>
>> <tokens>
>> <token>hello</token>
>> <token>goodbye</token>
>> <token>hello goodbye</token>
>> </tokens>
>> You can then use that to create your cts:query expression.
>>
>> Here is the function to get your query string and turn it into a
>> simple
>> xml structure:
>>
>> define function get-query-tokens($input as xs:string?) as element() {
>> (: This parses the quotes to be exact matches.
>> The idea for this comes from our friends at o'reilly
>> /xqzone/search/trunk/search/query-xml/query-xml.xqy :)
>> <tokens>{
>> let $newInput := fn:string-join(
>> (: check if there is more than one double-quotation mark. If there
>> is, tokenize on the double-quotation mark ("), then change the
>> spaces
>> in the even tokens to the string "!+!". This will then allow later
>> tokenization on spaces, so you can preserve quoted phrases as phrase
>> searches (after re-replacing the "!+!" strings with spaces). :)
>> if ( fn:count(fn:tokenize($input, '"')) > 2 )
>> then ( for $i at $count in fn:tokenize($input, '"')
>> return
>> if ($count mod 2 = 0)
>> then fn:replace($i, "\s+", "!+!")
>> else $i
>> )
>> else ( $input ) , " ")
>> let $tokenInput := fn:tokenize($newInput, "\s+")
>>
>> return (
>> for $x in $tokenInput
>> where $x ne ""
>> return
>> <token>{fn:replace($x, "!\+!", " ")}</token>)
>> }</tokens>
>> }
>>
>>
>> Hope that helps get you on the right path.
>> -Danny
>>
>> -----Original Message-----
>> From: general-bounces at xqzone.marklogic.com
>> [mailto:general-bounces at xqzone.marklogic.com] On Behalf Of Travis
>> Raybold
>> Sent: Thursday, March 02, 2006 11:45 AM
>> To: general at xqzone.marklogic.com
>> Subject: [XQZone General] cts:google-element-query
>>
>>
>> I'm creating a search interface, and I'm looking to tokenize the
>> search string, keep anything in quotes as a single term, default to
>> AND for the
>>
>> rest of the terms, and be able to handle NOT before a term. I think I
>> can work this out, but surely it must be a common type of
>> functionality,
>>
>> and a sample would probably save me hours of trudging... does anyone
>> have a sample they'd be willing to post of something similar to this?
>> If not,
>>
>> I'll develop it and post it back here when I'm done.
>>
>> Somehow the server didn't recognize cts:google-element-query... ;)
>>
>> Thanks,
>>
>> --Travis
>>
>> _______________________________________________
>> General mailing list
>> General at xqzone.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General at xqzone.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>>
>> .
>>
>>
>
> _______________________________________________
> General mailing list
> General at xqzone.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>
More information about the General
mailing list