[MarkLogic Dev General] Data profiling on large datasets

Florent Georges lists at fgeorges.org
Sat Mar 28 03:05:57 PDT 2015


This computes the string value of the entire document (for each
document), which is usually useless, and huge.  If you want to
retrieve the list of distinct root element names, use /*/name()
instead, not using a predicate (be careful about namespaces though).

-- 
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/


On 28 March 2015 at 08:05, Alex Jouravlev wrote:
>
> Hi everybody,
>
> I an trying to list all top-level element types using
>>
>> fn:distinct-values(/*[name()])
>
>
> The database has about 400,000 documents, but only a dozen of top-level element types
> The Query Console returns
>
> [1.0-ml] XDMP-EXPNTREECACHEFULL: fn:distinct-values(fn:collection()//*[fn:name(.)]) -- Expanded tree cache full on host hp5
>
>
> I am running it on a Win8 laptop with 8Gb of RAM and 16Gb of paging space, with plenty of free disk space. Already expanded tree cash to 8Gb - more than the data I have.
>
> What am I missing?
>
> Alex Jouravlev
> Director, Business Abstraction Pty Ltd
> Phone:       +61-(2)-8003-4830
> Mobile:     +61-4-0408-3258
> Web: http://www.businessabstraction.com
> LinkedIn: http://au.linkedin.com/in/alexjouravlev/
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>


More information about the General mailing list