[MarkLogic Dev General] referenced docs

Geert Josten geert.josten at dayon.nl
Thu Aug 2 14:37:01 PDT 2012


Hi Paul,



Not quite sure, but you could try rewriting the not() expression to a
not-query. Something like cts:search(/master,
cts:not-query(cts:element-attribute-value-query(“master”, “mid”,
cts:element-attribute-values(“ref”, “refid”)))), or perhaps better to use
collections to distinguish between master and ref documents. This would
return the actual master docs not being referenced, which might not be
efficient if there are many, but you can apply pagination on this
cts:search, or you can rewrite things a bit so you could use search:search
instead and get the pagination for free..



Did you do some performance measuring on a larger set of data? Let’s say,
something like 10k docs?



Kind regards,

Geert



*Van:* general-bounces at developer.marklogic.com [mailto:
general-bounces at developer.marklogic.com] *Namens *Paul M
*Verzonden:* donderdag 2 augustus 2012 22:32
*Aan:* general at developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] referenced docs



two refDocs (fragments).
these two refDocs reference four other masterDocs (fragments)

find masterDocs not referenced.
two refDocs have references (5,1) and (12,3).
five master docs (1,3,5,8,12)
8 is not referenced.

add element-attribute-range index to both refDocs <ref refid="5">
and another ear-index to masterDocs <master mid="5">
get all the element-attribute-values for mid and refid
then xpath intersection OF mid WITH refid

fn:not( . = ) worked great 8 is not in (1,3,5,12) and small sampling

Any alternative, more efficient methods? Docs may not be modified.
scale would be 1-mil master docs (very small) and 100-thousand
refdocs(small)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120802/a02a84a4/attachment.html 


More information about the General mailing list