[MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Evan Lenz Evan.Lenz at marklogic.com
Wed Mar 28 08:49:18 PDT 2012


This information isn't available in the universal index, because the universal index is about looking up documents based on information you provide (such as an element name).

However, if you want to retrieve a list of values from across the entire database (analytics), you need to have a lexicon configured. You can retrieve all of the values of an element, attribute, or field (if you have a range index configured). You can retrieve all the document URIs in the database (if the URI lexicon is enabled). You can retrieve all the collection URIs in use (if the collection lexicon is enabled). You can even retrieve all the words in the database (word lexicon), or words within a particular area (element, attribute, or field word lexicon). For the entire list of lexicons currently available, check out the blog article I posted yesterday, in particular the "Lexicon functions" section: http://community.marklogic.com/blog/grokking-the-cts-api#lexicon_functions

Unfortunately there is currently no element name lexicon. To get all the element names, you'd have to use a separate technique, like what others have recommended. Or, if you want to make this information quickly and repeatedly retrievable, you would need to duplicate it into one of the lexicon-enabled things I just mentioned.

The lexicon I'd probably choose to use in this case would be the collection lexicon, because it means I wouldn't have to change the XML itself. So each time you load your XML, rather than just doing a document insert, you could also tag it with a collection URI for each unique element name in the document:

xdmp:document-insert($uri, $doc),
xdmp:document-add-collections($uri, distinct-values($doc//*/concat("element/",local-name(.))))

(In the above code, only the local names are stored. If you wanted the full QName, you'd have to include the namespace URI too.)

Later on, to retrieve the full sorted list of unique local element names in your database (assuming all the documents have been added in this way, and you have the collection lexicon enabled), you'd write this:

cts:collection-match("element/*")

I'm not recommending this as a best practice or anything. It's just one way of doing what you might want, when push comes to shove.

Evan Lenz
Software Developer, Community
MarkLogic Corporation
community.marklogic.com<http://community.marklogic.com/>

From: Brent Hartwig <bhartwig at rsicms.com<mailto:bhartwig at rsicms.com>>
Reply-To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Date: Mon, 26 Mar 2012 06:44:31 -0700
To: sai shanker <lsaishanker at yahoo.com<mailto:lsaishanker at yahoo.com>>, MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Curious there aren’t functions like this tapping into the universal index.

-Brent

From: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com] On Behalf Of sai shanker
Sent: Monday, March 26, 2012 9:23 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL


hi,
you can loop across each document, grab all the child nodes and put them in a map.
Thanks and Regards,
Sai.


From: Ryan Dew <ryan.j.dew at gmail.com<mailto:ryan.j.dew at gmail.com>>
To: MarkLogic Developer Discussion <general at developer.marklogic.com<mailto:general at developer.marklogic.com>>
Sent: Monday, March 26, 2012 9:14 AM
Subject: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

You could try a recursive function like the following. No guarantee it is 100% right, if you have sub elements that have the same names as your root elements.

xquery version "1.0-ml";

declare function local:find-unique-qnames($found-qnames as xs:QName*) {
  let $next-qname := cts:search(collection()/*,
    if (exists($found-qnames))
    then cts:not-query(cts:element-query($found-qnames,cts:and-query(())))
    else cts:and-query(())
  )[1]/node-name(.)
  return if (exists($next-qname))
          then local:find-unique-qnames(($found-qnames,$next-qname))
          else $found-qnames
};

declare function local:find-unique-qnames() {
  for $qn in local:find-unique-qnames(())
  order by string($qn)
  return $qn
};

local:find-unique-qnames()

On Mon, Mar 26, 2012 at 6:36 AM, Geert Josten <geert.josten at dayon.nl<mailto:geert.josten at dayon.nl>> wrote:

Hi Vishnu,

It would help if you could explain why you need that list. But in general the best option would be to pre-calculate the list I guess. You can save it as a server-field (xdmp:set-server-field), to keep the list in memory on each host. But you would need an algorithm to initialize it, and each doc commit would have to check and update that list. The latter can be done with a post-commit trigger. The first can be done best by the strategy I already mentioned: divide all docs in chunks of 100 to 1000 docs, calculate distinct names of each chunk, and merge that somehow to the final list.

You could also raise the tree size setting temporarily to do that initial calculation..

Kind regards,
Geert

Van: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 14:29

Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Thanks Geert,

Is there any alternate solution to find the unique elements within a database?

Warm Regards,
Vishnu


On Mon, Mar 26, 2012 at 5:55 PM, Geert Josten <geert.josten at dayon.nl<mailto:geert.josten at dayon.nl>> wrote:
Hi Vishnu,

90 mb isn’t much indeed, but MarkLogic is configured to keep a low memory footprint, even if there are 30 concurrent requests. To make that sure, the tree size limit (look at the database setting in the admin interface) is usually pretty low. I have 8Gb and still it is set to no more than 85mb by default. But you can increase it if you like.

A more streaming approach like my advice attempts to achieve to some extend helps keeping the footprint low, and keep MarkLogic fast.

Kind regards,
Geert

Van: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 14:17
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

Thanks Geert,

But still it shows XDMP-EXPNTREECACHEFULL: distinct-values(collection("ContentAnalysis")//*/local-name()) -- Expanded tree cache full on host.... the database overall size is only 90MB i don't think it is so huge data for marklogic....


Regards,
Vishnu

On Mon, Mar 26, 2012 at 1:25 PM, Geert Josten <geert.josten at dayon.nl<mailto:geert.josten at dayon.nl>> wrote:
Hi Vishnu,

Your FLWOR expression won’t return distinct names, since you are applying the function to each individual name. You should write:

distinct-values(
    for $a in //*
    return $a
)

Or better:

distinct-values(collection()//*/local-name())

But this still might not perform well, or still max out on list or tree caches. This approach is creating a complete list of all element names first, and starts applying distinct-values only thereafter. You might consider taking multiple steps, like per doc first, and then clustering per 100 files, and only then all clusters. You could also just take 100 random samples, and use that. That doesn’t guarantee a 100% complete list, but it remains performant even if your database grows 10 or 100 fold.

Kind regards,
Geert

Van: general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com> [mailto:general-bounces at developer.marklogic.com<mailto:general-bounces at developer.marklogic.com>] Namens VISH RAJPUT
Verzonden: maandag 26 maart 2012 8:29
Aan: general at developer.marklogic.com<mailto:general at developer.marklogic.com>
Onderwerp: [MarkLogic Dev General] Fwd: [1.0-ml] XDMP-EXPNTREECACHEFULL

The size of the all files is 90 MB approx.
---------- Forwarded message ----------
From: VISH RAJPUT <svishnu.singh4 at gmail.com<mailto:svishnu.singh4 at gmail.com>>
Date: Mon, Mar 26, 2012 at 11:56 AM
Subject: [1.0-ml] XDMP-EXPNTREECACHEFULL
To: general at developer.marklogic.com<mailto:general at developer.marklogic.com>


Hi,

I have 2000 files in Marklogic database within a single forest and i want to find out the unique element name from this database for the whole 2000 files. For this i wrote the below query:-

for $a in //*
return distinct-values($a/local-name()))

but by this i got an error "[1.0-ml] XDMP-EXPNTREECACHEFULL"  what should i do?


Regards,
Vishnu Singh


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General at developer.marklogic.com<mailto:General at developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20120328/b5aacd86/attachment-0001.html 


More information about the General mailing list