[XQZone General] RE: General Digest, Vol 16, Issue 2

Kelly Stirman kelly at marklogic.com
Wed Oct 5 14:02:42 PDT 2005


i actually prefer returning a node rather than a "snippet"of words or characters"around"the matching term. cts:highlight offers this ability by using $cts:node instead of $cts:text. this returns the enclosing node of the matching terms, so you have structural context. from this node, you can further navigate to other places in the document.
 
cts:highlight() also provides you the ability to return the part of the query that matches so that, in 
 
keep in mind that $cts:node returns the node within the document, not just as the node itself, so you'll probably want to enclose it in an element name that you can then extract from the resultant constructed node with a double slash. also, it will return the enclosing node for every match, so you probably want to simply get the first one.
 
kelly

________________________________

From: general-bounces at xqzone.marklogic.com on behalf of general-request at xqzone.marklogic.com
Sent: Wed 10/5/2005 3:00 PM
To: general at xqzone.marklogic.com
Subject: General Digest, Vol 16, Issue 2



Send General mailing list submissions to
        general at xqzone.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
        http://xqzone.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
        general-request at xqzone.marklogic.com

You can reach the person managing the list at
        general-owner at xqzone.marklogic.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. returning snippets centered on matches... (Travis Raybold)
   2. Re: returning snippets centered on matches...
      (Shannon Scott Shiflett)
   3. Re: returning snippets centered on matches... (Travis Raybold)


----------------------------------------------------------------------

Message: 1
Date: Tue, 04 Oct 2005 14:06:58 -0700
From: Travis Raybold <travis at raybold.com>
Subject: [XQZone General] returning snippets centered on matches...
To: General XQZone Discussion <general at xqzone.marklogic.com>
Message-ID: <4342EEF2.5060709 at raybold.com>
Content-Type: text/plain; charset="iso-8859-1"

im searching a data node, and want to return just a subset of the text,
centered on the matching string. i've modified some code kelly sent me,
and it works fairly well, but a search that used to take a fraction of a
second now takes 5 seconds. has anyone coded a function to return
snippets instead of the whole text that are reasonably efficient? i
could code this in php, but it seems a natural place to use marklogic,
if i can just figure out how to make it efficient.

thanks in advance for any help, the sample code follows

--travis

----------------------------------------------------------------------------------------------------


let $string := "totalitarian"

let $query := cts:element-word-query(xs:QName("data"),$string)

return

for $class in (cts:search(//class,$query))

return

<result>

{

let $scope := 5



let $hit :=
cts:highlight($class,$query,concat("<TRAVISRAYBOLDCONSTANT_START>",$cts:text,"<TRAVISRAYBOLDCONSTANT_END>"))



let $hit-text := cts:highlight($class,$query,<span
style="color:red;font-weight:bold">{$cts:text}</span>)



let $before :=
tokenize(substring-before($hit,"<TRAVISRAYBOLDCONSTANT_END>")," ")

let $after :=
tokenize(substring-after($hit,"<TRAVISRAYBOLDCONSTANT_START>")," ")



let $before-count := count($before)

let $before-snippet := $before[($before-count - $scope) to $before-count]

let $after-snippet := $after[1 to $scope]



let $snippet := <snippet>. . . .
{string-join(($before-snippet,$hit-text//this-hit/text(),$after-snippet),"
")} . . . . </snippet>



let $hilight-snippet :=

cts:highlight($snippet,$string,<span
style="color:red;font-weight:bold">{$cts:text}</span>)



return $hilight-snippet

}

</result>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3174 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20051004/987459fa/smime-0001.bin

------------------------------

Message: 2
Date: Tue, 4 Oct 2005 17:16:06 -0400
From: Shannon Scott Shiflett <sss4r at virginia.edu>
Subject: Re: [XQZone General] returning snippets centered on
        matches...
To: General XQZone Discussion <general at xqzone.marklogic.com>
Message-ID: <DB7F5F8B-0603-4F54-A201-B62038DB95B7 at virginia.edu>
Content-Type: text/plain; charset="us-ascii"

Travis, good question, good timing.  I wish I could help, but writing 
code using cts:highlight() to return "highlighted hits in context" is 
something I'll be tasked to do in the very near future, so I'm eager 
to read the responses to this thread....

On Oct 4, 2005, at 5:06 PM, Travis Raybold wrote:

> im searching a data node, and want to return just a subset of the 
> text, centered on the matching string. i've modified some code 
> kelly sent me, and it works fairly well, but a search that used to 
> take a fraction of a second now takes 5 seconds. has anyone coded a 
> function to return snippets instead of the whole text that are 
> reasonably efficient? i could code this in php, but it seems a 
> natural place to use marklogic, if i can just figure out how to 
> make it efficient.
>
> thanks in advance for any help, the sample code follows
>
> --travis
>
> ----------------------------------------------------------------------
> ------------------------------
>
> let $string := "totalitarian"
>
> let $query := cts:element-word-query(xs:QName("data"),$string)
>
> return
>
> for $class in (cts:search(//class,$query))
>
> return
>
> <result>
>
> {
>
> let $scope := 5
>
>
> let $hit := cts:highlight($class,$query,concat
> ("<TRAVISRAYBOLDCONSTANT_START>",
> $cts:text,"<TRAVISRAYBOLDCONSTANT_END>"))
>
>
> let $hit-text := cts:highlight($class,$query,<span 
> style="color:red;font-weight:bold">{$cts:text}</span>)
>
>
> let $before := tokenize(substring-before
> ($hit,"<TRAVISRAYBOLDCONSTANT_END>")," ")
>
> let $after := tokenize(substring-after
> ($hit,"<TRAVISRAYBOLDCONSTANT_START>")," ")
>
>
> let $before-count := count($before)
>
> let $before-snippet := $before[($before-count - $scope) to $before-
> count]
>
> let $after-snippet := $after[1 to $scope]
>
>
> let $snippet := <snippet>. . . . {string-join(($before-snippet,$hit-
> text//this-hit/text(),$after-snippet)," ")} . . . . </snippet>
>
>
> let $hilight-snippet :=
>
> cts:highlight($snippet,$string,<span style="color:red;font-
> weight:bold">{$cts:text}</span>)
>
>
> return $hilight-snippet
>
> }
>
> </result>
>
>
> _______________________________________________
> General mailing list
> General at xqzone.marklogic.com
> http://xqzone.com/mailman/listinfo/general
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4196 bytes
Desc: not available
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20051004/f48b176b/smime-0001.bin

------------------------------

Message: 3
Date: Tue, 04 Oct 2005 14:34:52 -0700
From: Travis Raybold <travis at raybold.com>
Subject: Re: [XQZone General] returning snippets centered on
        matches...
To: General XQZone Discussion <general at xqzone.marklogic.com>
Message-ID: <4342F57C.9010104 at raybold.com>
Content-Type: text/plain; charset="iso-8859-1"

shannon,

cts:highlight() worked pretty easily for me...  still getting used to
the syntax of xquery, but  it made sense, and didn't seem to slow down
the search appreciably.

--travis
-----------------------------------------------------------------------

let $query := cts:element-word-query(xs:QName("data"),"plant")

return

for $class in (cts:search(//class,$query))

return

<result>

{

cts:highlight($class,$query,<span
style="color:red;font-weight:bold">{$cts:text}</span>)

}

</result>

-------------------------------------------------


Shannon Scott Shiflett wrote:

> Travis, good question, good timing.  I wish I could help, but writing 
> code using cts:highlight() to return "highlighted hits in context" is 
> something I'll be tasked to do in the very near future, so I'm eager 
> to read the responses to this thread....
>
> On Oct 4, 2005, at 5:06 PM, Travis Raybold wrote:
>
>> im searching a data node, and want to return just a subset of the 
>> text, centered on the matching string. i've modified some code  kelly
>> sent me, and it works fairly well, but a search that used to  take a
>> fraction of a second now takes 5 seconds. has anyone coded a 
>> function to return snippets instead of the whole text that are 
>> reasonably efficient? i could code this in php, but it seems a 
>> natural place to use marklogic, if i can just figure out how to  make
>> it efficient.
>>
>> thanks in advance for any help, the sample code follows
>>
>> --travis
>>
>> ----------------------------------------------------------------------
>> ------------------------------
>>
>> let $string := "totalitarian"
>>
>> let $query := cts:element-word-query(xs:QName("data"),$string)
>>
>> return
>>
>> for $class in (cts:search(//class,$query))
>>
>> return
>>
>> <result>
>>
>> {
>>
>> let $scope := 5
>>
>>
>> let $hit := cts:highlight($class,$query,concat
>> ("<TRAVISRAYBOLDCONSTANT_START>",
>> $cts:text,"<TRAVISRAYBOLDCONSTANT_END>"))
>>
>>
>> let $hit-text := cts:highlight($class,$query,<span 
>> style="color:red;font-weight:bold">{$cts:text}</span>)
>>
>>
>> let $before := tokenize(substring-before
>> ($hit,"<TRAVISRAYBOLDCONSTANT_END>")," ")
>>
>> let $after := tokenize(substring-after
>> ($hit,"<TRAVISRAYBOLDCONSTANT_START>")," ")
>>
>>
>> let $before-count := count($before)
>>
>> let $before-snippet := $before[($before-count - $scope) to $before-
>> count]
>>
>> let $after-snippet := $after[1 to $scope]
>>
>>
>> let $snippet := <snippet>. . . . {string-join(($before-snippet,$hit-
>> text//this-hit/text(),$after-snippet)," ")} . . . . </snippet>
>>
>>
>> let $hilight-snippet :=
>>
>> cts:highlight($snippet,$string,<span style="color:red;font-
>> weight:bold">{$cts:text}</span>)
>>
>>
>> return $hilight-snippet
>>
>> }
>>
>> </result>
>>
>>
>> _______________________________________________
>> General mailing list
>> General at xqzone.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>General mailing list
>General at xqzone.marklogic.com
>http://xqzone.com/mailman/listinfo/general
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3174 bytes
Desc: S/MIME Cryptographic Signature
Url : http://xqzone.marklogic.com/pipermail/general/attachments/20051004/3864cd97/smime-0001.bin

------------------------------

_______________________________________________
General mailing list
General at xqzone.marklogic.com
http://xqzone.com/mailman/listinfo/general


End of General Digest, Vol 16, Issue 2
**************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://xqzone.marklogic.com/pipermail/general/attachments/20051005/e823700f/attachment.html


More information about the General mailing list