[MarkLogic Dev General] Very puzzling bug in wildcard search results
Danny.Sokolsky at marklogic.com
Wed Mar 11 16:17:09 PDT 2015
Also, make sure you have the proper wildcard indexes in both places:
Do you have the codepoint word lexicon in both places?
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Wednesday, March 11, 2015 4:16 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Very puzzling bug in wildcard search results
You might try 7.0-5, released on Friday. I believe at least one wildcard bug was fixed. Reindexing is a good idea too.
Failing those, xdmp:plan or xdmp:query-trace might show some useful debugging info.
> On 11 Mar 2015, at 11:22 , David Sewell <dsewell at virginia.edu> wrote:
> I'm trying to figure out what could possibly account for buggy results
> for wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).
> I have two servers running on the same data set of 166K documents,
> with identical host, database and app server settings so far as I can
> determine (for anything related to word query at least). Ordinarily,
> wildcard searches on words return the exact same number of matches on both hosts. For example:
> H1 H2
> democra* 1579 1579
> demo* 4354 4354
> dem* 16866 16866
> But there are certain word stems that produce buggy results on H2,
> matching all documents when they shouldn't. Actually I should say
> "word stem" because the buggy results all involve words starting in "rel". For example:
> H1 H2
> religions* 138 138
> religion* 2448 166618
> relig* 3810 166618
> reli* 14608 166618
> rel* 39888 39888
> re* 150890 166618
> relia* 1084 166618
> relie* 8306 166618
> relo* 156 166618
> relm* 3 3
> I have tried unsuccesfully to find other letter sequences that exhibit
> the bug in a wildcard search or that give different result counts for
> H2. So far it's only certain "rel-" examples.
> My next step will be a forced reindex of the database on H2 to see if
> that helps, but before I do that I wonder if anyone has a clue what
> might account for this behavior.
> Even odder, on two entirely different systems running an entirely
> different MarkLogic software instance, "rel-" searches are also
> showing discrepancies, though I haven't researched that one as
> thoroughly. Some deep-level indexing bug, possibly?
> David Sewell, Editorial and Technical Manager ROTUNDA, The University
> of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA
> Email: dsewell at virginia.edu Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/
> General mailing list
> General at developer.marklogic.com
General mailing list
General at developer.marklogic.com
More information about the General