[MarkLogic Dev General] Very puzzling bug in wildcard search results

David Sewell dsewell at virginia.edu
Wed Mar 11 18:49:45 PDT 2015


Reindexing did the trick. For good measure I adjusted the wildcard index 
settings.

On Wed, 11 Mar 2015, Michael Blakeley wrote:

> You might try 7.0-5, released on Friday. I believe at least one wildcard bug was fixed. Reindexing is a good idea too.
>
> Failing those, xdmp:plan or xdmp:query-trace might show some useful debugging info.
>
> -- Mike
>
>> On 11 Mar 2015, at 11:22 , David Sewell <dsewell at virginia.edu> wrote:
>>
>> I'm trying to figure out what could possibly account for buggy results for
>> wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).
>>
>> I have two servers running on the same data set of 166K documents, with
>> identical host, database and app server settings so far as I can determine (for
>> anything related to word query at least). Ordinarily, wildcard searches on words
>> return the exact same number of matches on both hosts. For example:
>>
>> 		H1	H2
>> democra*	 1579	 1579
>> demo*		 4354	 4354
>> dem*		16866	16866
>>
>> But there are certain word stems that produce buggy results on H2, matching all
>> documents when they shouldn't. Actually I should say "word stem" because the
>> buggy results all involve words starting in "rel". For example:
>>
>> 		H1	H2
>> religions*	   138	   138
>> religion*	  2448	166618
>> relig*		  3810	166618
>> reli*		 14608	166618
>> rel*		 39888   39888
>> re*		150890	166618
>> relia*		  1084	166618
>> relie*		  8306	166618
>> relo*		   156	166618
>> relm*		     3	     3
>>
>> I have tried unsuccesfully to find other letter sequences that exhibit the bug
>> in a wildcard search or that give different result counts for H2. So far it's
>> only certain "rel-" examples.
>>
>> My next step will be a forced reindex of the database on H2 to see if that
>> helps, but before I do that I wonder if anyone has a clue what might account for
>> this behavior.
>>
>> Even odder, on two entirely different systems running an entirely different
>> MarkLogic software instance, "rel-" searches are also showing discrepancies,
>> though I haven't researched that one as thoroughly. Some deep-level indexing
>> bug, possibly?
>>
>> David
>>
>> --
>> David Sewell, Editorial and Technical Manager
>> ROTUNDA, The University of Virginia Press
>> PO Box 400314, Charlottesville, VA 22904-4314 USA
>> Email: dsewell at virginia.edu   Tel: +1 434 924 9973
>> Web: http://rotunda.upress.virginia.edu/
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/


More information about the General mailing list