[MarkLogic Dev General] Interesting case where ML refuses
tooptimize XPath
Lee, David
dlee at epocrates.com
Sun Dec 6 05:52:19 PST 2009
One more interesting tidbit
This expression did NOT use the indexes
/rxnsat//row[RXAUI eq $id2]
But this did
//row[RXAUI eq $id2]
-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Lee, David
Sent: Sunday, December 06, 2009 7:56 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses
tooptimize XPath
That query doesnt do what I want, because (shame on me) I have multiple
docs with //row elements.
But just for testing I ran it and it performs about the same as the
cts:search case.
(4.5 sec) so it seems to be using indexes in that case.
Even that seems to be too slow for me where the result set is 8 records
of about 100 bytes each.
My real question here is one I'm trying to discover. And one that I
think many people are asking.
Can I get MarkLogic to perform like an RDBMS in the (hopefully rare)
cases where the data really is like RDB data ?
That is lots (millions) of small identical "rows" of data where I'd like
to 'simply' look up a row by an exact key match. Not word or phrase or
wildcard searching of big docs in the haystack,
but a real RDBMS style single key lookup type index.
I tried creating a FIELD but that didn't seem to do much good.
(The field search wasn't any faster then element-word searches).
What's interesting is I can search other document sets and return
hundreds of results in < 200ms
but this one is really thrashing ML. I suspect due to the high
fragmentation. (3 million fragments).
But what's the suggestion when the data really is flat like this ? If
I dont fragment it,
it makes a 1G XML file .. which blows up ML. There's no structure
in-between.
What I going to experiment with next is sticking this particular file in
an RDBMS and using the SQL connector code ... Yuck. I was really
hopping not to do that.
Another idea, which I think is pretty ugly, but might help, is to
artificially create structure where none exists. For example say group
the records by the first 2 digits of the key value into a document and
reduce the fragmentation by 100x But even getting this restructuring
done is painful because the doc is too big to load into memory so I need
to use a DB just to get at it.
Which probably means I load it into an RDBMS to restructure the XML or
maybe just leave it there.
I'm sure others have had this kind of problem ? Any suggestions for
techniques for handling millions of "rows" of very small "records" ?
-----Original Message-----
From: general-bounces at developer.marklogic.com
[mailto:general-bounces at developer.marklogic.com] On Behalf Of Jason
Hunter
Sent: Sunday, December 06, 2009 12:31 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Interesting case where ML refuses
tooptimize XPath
I'm curious what speed you see for this:
//row[RXAUI = $id2]
I'm assuming row is your fragment root.
-jh-
On Dec 5, 2009, at 6:20 AM, Lee, David wrote:
> I have 2 xml docs, each about 1GB and about 2 mil fragments ("rows")
each ... in fact the elements are called "rows".
> Each "row" element is about 500 bytes. But I dont yet have a better
way to fragment them.
> ( Yes Its been suggested to split these to seperate docs and I may
experiment with that. )
>
> Here's a case where I've found ML refuses to optimize xpaths.
>
> First off, this expression takes about 5 seconds, which I find a
little slow ... it returns 8 rows.
>
>
> declare variable $id := '2483417';
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id]
> return $r
>
>
> Now to complicate things I actually need $id from a previous query so
the real query is like
>
>
> declare variable $id := '2483417';
> declare variable $c := doc("/RxNorm/rxnconso.xml")/rxnconso/row[RXAUI
eq $id];
> declare variable $id2 as xs:string := $c/RXAUI/string();
>
> for $r in doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $id2]
> return $r
>
> This takes about 1 minute ! .. Checking the profile I find the
expression row[ RXAUI eq $id] is evaluated a million times ...
indicating its not doing indexing.
>
> I've tried all sorts of combinations of these like
>
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[xs:string(RXAUI) eq $id2]
> doc("/RxNorm/rxnsat.xml")/rxnsat/row[RXAUI eq $c/RXAUI]
> doc("/RxNorm/rxnsat.xml")/rxnsat/row/RXAUI[. eq $id2]/ancestor::row
>
>
> All to the same avail ... no indexing !
>
> But of course this brings things back to speed
>
> ---------
> for $r in cts:search(doc("/RxNorm/rxnsat.xml")/rxnsat/row,
> cts:element-query( xs:QName("RXAUI") , $id2 ))
> return $r
>
> ------------
>
>
> Still takes too long (about 5 sec) ... but its back to realtime
atleast.
>
> I'm experimenting now with fields ...
>
> But I find it strange that I cant the xpath expression to use the
indexes in one case but it does in another that seems almost identical
to me.
>
> This expression
> declare variable $id2 as xs:string := $c/RXAUI/string();
>
> should tell the system that $id2 is a single string so why wont it use
it in xpath based index queries ?
>
>
>
>
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> dlee at epocrates.com
> 812-482-5224
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
General at developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
More information about the General
mailing list