[MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Ryan Dew ryan.j.dew at gmail.com
Mon Mar 26 06:14:16 PDT 2012


You could try a recursive function like the following. No guarantee it is
100% right, if you have sub elements that have the same names as your root
elements.

xquery version "1.0-ml";

declare function local:find-unique-qnames($found-qnames as xs:QName*) {
  let $next-qname := cts:search(collection()/*,
    if (exists($found-qnames))
    then cts:not-query(cts:element-query($found-qnames,cts:and-query(())))
    else cts:and-query(())
  )[1]/node-name(.)
  return if (exists($next-qname))
          then local:find-unique-qnames(($found-qnames,$next-qname))
          else $found-qnames
};

declare function local:find-unique-qnames() {
  for $qn in local:find-unique-qnames(())
  order by string($qn)
  return $qn
};

local:find-unique-qnames()

On Mon, Mar 26, 2012 at 6:36 AM, Geert Josten <geert.josten at dayon.nl> wrote:

> Hi Vishnu,
>
>
>
> It would help if you could explain why you need that list. But in general
> the best option would be to pre-calculate the list I guess. You can save it
> as a server-field (xdmp:set-server-field), to keep the list in memory on
> each host. But you would need an algorithm to initialize it, and each doc
> commit would have to check and update that list. The latter can be done
> with a post-commit trigger. The first can be done best by the strategy I
> already mentioned: divide all docs in chunks of 100 to 1000 docs, calculate
> distinct names of each chunk, and merge that somehow to the final list.
>
>
>
> You could also raise the tree size setting temporarily to do that initial
> calculation..
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] *Namens *VISH RAJPUT
> *Verzonden:* maandag 26 maart 2012 14:29
>
> *Aan:* MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Fwd: [1.0-ml]
> XDMP-EXPNTREECACHEFULL
>
>
>
> Thanks Geert,
>
>
>
> Is there any alternate solution to find the unique elements within a
> database?
>
>
>
> Warm Regards,
>
> Vishnu
>
>
>
>
>
> On Mon, Mar 26, 2012 at 5:55 PM, Geert Josten <geert.josten at dayon.nl>
> wrote:
>
> Hi Vishnu,
>
>
>
> 90 mb isn’t much indeed, but MarkLogic is configured to keep a low memory
> footprint, even if there are 30 concurrent requests. To make that sure, the
> tree size limit (look at the database setting in the admin interface) is
> usually pretty low. I have 8Gb and still it is set to no more than 85mb by
> default. But you can increase it if you like.
>
>
>
> A more streaming approach like my advice attempts to achieve to some
> extend helps keeping the footprint low, and keep MarkLogic fast.
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] *Namens *VISH RAJPUT
> *Verzonden:* maandag 26 maart 2012 14:17
> *Aan:* MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Fwd: [1.0-ml]
> XDMP-EXPNTREECACHEFULL
>
>
>
> Thanks Geert,
>
>
>
> But still it shows *XDMP-EXPNTREECACHEFULL: distinct-values(collection("ContentAnalysis")//*/local-name()) --
> Expanded tree cache full on host.... *the database overall size is only
> 90MB i don't think it is so huge data for marklogic....
>
>
>
>
>
> Regards,
>
> Vishnu
>
>
>
> On Mon, Mar 26, 2012 at 1:25 PM, Geert Josten <geert.josten at dayon.nl>
> wrote:
>
> Hi Vishnu,
>
>
>
> Your FLWOR expression won’t return distinct names, since you are applying
> the function to each individual name. You should write:
>
>
>
> distinct-values(
>
>     for $a in //*
>
>     return $a
>
> )
>
>
>
> Or better:
>
>
>
> distinct-values(collection()//*/local-name())
>
>
>
> But this still might not perform well, or still max out on list or tree
> caches. This approach is creating a complete list of all element names
> first, and starts applying distinct-values only thereafter. You might
> consider taking multiple steps, like per doc first, and then clustering per
> 100 files, and only then all clusters. You could also just take 100 random
> samples, and use that. That doesn’t guarantee a 100% complete list, but it
> remains performant even if your database grows 10 or 100 fold.
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-bounces at developer.marklogic.com [mailto:
> general-bounces at developer.marklogic.com] *Namens *VISH RAJPUT
> *Verzonden:* maandag 26 maart 2012 8:29
> *Aan:* general at developer.marklogic.com
> *Onderwerp:* [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL
>
>
>
> The size of the all files is 90 MB approx.
>
> ---------- Forwarded message ----------
> From: *VISH RAJPUT* <svishnu.singh4 at gmail.com>
> Date: Mon, Mar 26, 2012 at 11:56 AM
> Subject: [1.0-ml] XDMP-EXPNTREECACHEFULL
> To: general at developer.marklogic.com
>
>
> Hi,
>
>
>
> I have 2000 files in Marklogic database within a single forest and i want
> to find out the unique element name from this database for the whole 2000
> files. For this i wrote the below query:-
>
>
>
> for $a in //*
>
> return distinct-values($a/local-name()))
>
>
>
> but by this i got an error "*[1.0-ml] XDMP-EXPNTREECACHEFULL" * what
> should i do?
>
>
>
>
>
> Regards,
>
> Vishnu Singh
>
>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120326/f53b1664/attachment-0001.html 


More information about the General mailing list