[MarkLogic Dev General] referenced docs

Paul M pjmaip at yahoo.com
Fri Aug 3 10:11:12 PDT 2012


It's slow as docs increase. 

fn:not has no affect...

(1 to 9000) = (9999 to 19999) ->15 sec

go thru 9000 values and compare to 10000 values...has to be slow



________________________________
 From: "general-request at developer.marklogic.com" <general-request at developer.marklogic.com>
To: general at developer.marklogic.com 
Sent: Friday, August 3, 2012 4:45 AM
Subject: General Digest, Vol 98, Issue 2
 
Send General mailing list submissions to
    general at developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
    http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    general-request at developer.marklogic.com

You can reach the person managing the list at
    general-owner at developer.marklogic.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. referenced docs (Paul M)
   2. Re: referenced docs (Geert Josten)
   3. search function and results per document    trunktation error
      (Erik Zander)


----------------------------------------------------------------------

Message: 1
Date: Thu, 2 Aug 2012 13:32:03 -0700 (PDT)
From: Paul M <pjmaip at yahoo.com>
Subject: [MarkLogic Dev General] referenced docs
To: "general at developer.marklogic.com"
    <general at developer.marklogic.com>
Message-ID:
    <1343939523.88763.YahooMailNeo at web163803.mail.gq1.yahoo.com>
Content-Type: text/plain; charset="us-ascii"

two refDocs (fragments). 
these two refDocs reference four other masterDocs (fragments)

find masterDocs not referenced.
two refDocs have references (5,1) and (12,3).
five master docs (1,3,5,8,12)
8 is not referenced.

add element-attribute-range index to both refDocs <ref refid="5"> 
and another ear-index to masterDocs <master mid="5">
get all the element-attribute-values for mid and refid
then xpath intersection OF mid WITH refid
fn:not( . = ) worked great 8 is not in (1,3,5,12) and small sampling

Any alternative, more efficient methods? Docs may not be modified.
scale would be 1-mil master docs (very small) and 100-thousand refdocs(small)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120802/a6513255/attachment-0001.html 

------------------------------

Message: 2
Date: Thu, 2 Aug 2012 23:37:01 +0200
From: Geert Josten <geert.josten at dayon.nl>
Subject: Re: [MarkLogic Dev General] referenced docs
To: Paul M <pjmaip at yahoo.com>,     MarkLogic Developer Discussion
    <general at developer.marklogic.com>
Message-ID: <ac15e60cfbea5e11b8ea504310407ebd at mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hi Paul,



Not quite sure, but you could try rewriting the not() expression to a
not-query. Something like cts:search(/master,
cts:not-query(cts:element-attribute-value-query(?master?, ?mid?,
cts:element-attribute-values(?ref?, ?refid?)))), or perhaps better to use
collections to distinguish between master and ref documents. This would
return the actual master docs not being referenced, which might not be
efficient if there are many, but you can apply pagination on this
cts:search, or you can rewrite things a bit so you could use search:search
instead and get the pagination for free..



Did you do some performance measuring on a larger set of data? Let?s say,
something like 10k docs?



Kind regards,

Geert



*Van:* general-bounces at developer.marklogic.com [mailto:
general-bounces at developer.marklogic.com] *Namens *Paul M
*Verzonden:* donderdag 2 augustus 2012 22:32
*Aan:* general at developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] referenced docs



two refDocs (fragments).
these two refDocs reference four other masterDocs (fragments)

find masterDocs not referenced.
two refDocs have references (5,1) and (12,3).
five master docs (1,3,5,8,12)
8 is not referenced.

add element-attribute-range index to both refDocs <ref refid="5">
and another ear-index to masterDocs <master mid="5">
get all the element-attribute-values for mid and refid
then xpath intersection OF mid WITH refid

fn:not( . = ) worked great 8 is not in (1,3,5,12) and small sampling

Any alternative, more efficient methods? Docs may not be modified.
scale would be 1-mil master docs (very small) and 100-thousand
refdocs(small)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120802/a02a84a4/attachment-0001.html 

------------------------------

Message: 3
Date: Fri, 3 Aug 2012 10:47:06 +0200
From: Erik Zander <Erik.Zander at studentlitteratur.se>
Subject: [MarkLogic Dev General] search function and results per
    document    trunktation error
To: "general at developer.marklogic.com"
    <general at developer.marklogic.com>
Message-ID:
    <666D23968830644D92011BDE450FBE8031E332D12D at DRSTUEX01.studentlitteratur.corp>
    
Content-Type: text/plain; charset="iso-8859-1"

Hi All

I have a problem with the search functions both cts:search and search:search.

The problem is that when doing a search over a collection documents with many matches are prioritized and first after that the custom weights are added.

As a result the search have truncated the result even before we are able to impact the score of the matches.

What we would need would be to have the matches returned independent of in what document the specific element lays. This so that we could prioritize for example all relevant docbook:titles first then go into docbook:blockquotes and lastly single docbook:paras in more than one document whit the docbook structure (see below for super short example)

<chapter xml:id="isbn_9789144019895_ch_2" label="2">
<title>Den kliniska unders?kningen</title>
<section>
<title>Sjukhistorien</title>
<para>En noggrant f?rd journal ?r givetvis av samma vikt vid hj?rtsjukdomarna som i alla andra medicinska sammanhang. Vilket eller vilka symtom begr?nsar prestationsf?rm?gan? De viktigaste och vanligaste symtomen hos hj?rtsjuka ?r <emphasis role="italic">tr?tthet</emphasis>, som uttryck f?r l?g hj?rtminutvolym, <emphasis role="italic">andf?ddhet</emphasis> framf?r allt orsakad av lungstas, <emphasis role="italic">br?stsm?rta</emphasis> vid k?rlkramp, samt <emphasis role="italic">arytmiupplevelse</emphasis>. Hj?rtpatienter har ofta en anm?rkningsv?rd f?rm?ga att):</para>
<para>Indelningen enligt NYHA till?mpas framf?r allt i samband med hj?rtsvikt. Vid ischemisk hj?rtsjukdom klassificeras symtomen vanligen enligt Canadian Cardiological Society (CCS), vars indelning i de fyra klasserna i princip inte skiljer sig fr?n den<example label="Faktaruta 2.1" xml:id="isbn_9789144019895_infobox_1" role="box">
<title/>
<blockquote><itemizedlist mark="none">
<listitem><para><emphasis role="bold">Klass I</emphasis>

I'm lost to wherein I should be searching for an solution to this problem, how should we do to search in the documents returning results scored independent of which document it is in?
Is this a coding or a configuration error or is this the expected and only behavior?

Regards
Erik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120803/e9a315fa/attachment.html 

------------------------------

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 98, Issue 2
**************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120803/111b14dd/attachment.html 


More information about the General mailing list