[MarkLogic Dev General] Very puzzling bug in wildcard search results

Michael Blakeley mike at blakeley.com
Wed Mar 11 16:15:56 PDT 2015


You might try 7.0-5, released on Friday. I believe at least one wildcard bug was fixed. Reindexing is a good idea too.

Failing those, xdmp:plan or xdmp:query-trace might show some useful debugging info.

-- Mike

> On 11 Mar 2015, at 11:22 , David Sewell <dsewell at virginia.edu> wrote:
> 
> I'm trying to figure out what could possibly account for buggy results for 
> wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).
> 
> I have two servers running on the same data set of 166K documents, with 
> identical host, database and app server settings so far as I can determine (for 
> anything related to word query at least). Ordinarily, wildcard searches on words 
> return the exact same number of matches on both hosts. For example:
> 
> 		H1	H2
> democra*	 1579	 1579
> demo*		 4354	 4354
> dem*		16866	16866
> 
> But there are certain word stems that produce buggy results on H2, matching all 
> documents when they shouldn't. Actually I should say "word stem" because the 
> buggy results all involve words starting in "rel". For example:
> 
> 		H1	H2
> religions*	   138	   138
> religion*	  2448	166618
> relig*		  3810	166618
> reli*		 14608	166618
> rel*		 39888   39888
> re*		150890	166618
> relia*		  1084	166618
> relie*		  8306	166618
> relo*		   156	166618
> relm*		     3	     3
> 
> I have tried unsuccesfully to find other letter sequences that exhibit the bug 
> in a wildcard search or that give different result counts for H2. So far it's 
> only certain "rel-" examples.
> 
> My next step will be a forced reindex of the database on H2 to see if that 
> helps, but before I do that I wonder if anyone has a clue what might account for 
> this behavior.
> 
> Even odder, on two entirely different systems running an entirely different 
> MarkLogic software instance, "rel-" searches are also showing discrepancies, 
> though I haven't researched that one as thoroughly. Some deep-level indexing 
> bug, possibly?
> 
> David
> 
> -- 
> David Sewell, Editorial and Technical Manager
> ROTUNDA, The University of Virginia Press
> PO Box 400314, Charlottesville, VA 22904-4314 USA
> Email: dsewell at virginia.edu   Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 



More information about the General mailing list