[MarkLogic Dev General] Very puzzling bug in wildcard search results

Danny Sokolsky Danny.Sokolsky at marklogic.com
Wed Mar 11 16:17:09 PDT 2015


Also, make sure you have the proper wildcard indexes in both places:

http://docs.marklogic.com/guide/search-dev/wildcard#id_14163

Do you have the codepoint word lexicon in both places?

-Danny

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Wednesday, March 11, 2015 4:16 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Very puzzling bug in wildcard search results

You might try 7.0-5, released on Friday. I believe at least one wildcard bug was fixed. Reindexing is a good idea too.

Failing those, xdmp:plan or xdmp:query-trace might show some useful debugging info.

-- Mike

> On 11 Mar 2015, at 11:22 , David Sewell <dsewell at virginia.edu> wrote:
> 
> I'm trying to figure out what could possibly account for buggy results 
> for wildcard searches in certain fringe cases (running MarkLogic 7.0-4.3).
> 
> I have two servers running on the same data set of 166K documents, 
> with identical host, database and app server settings so far as I can 
> determine (for anything related to word query at least). Ordinarily, 
> wildcard searches on words return the exact same number of matches on both hosts. For example:
> 
> 		H1	H2
> democra*	 1579	 1579
> demo*		 4354	 4354
> dem*		16866	16866
> 
> But there are certain word stems that produce buggy results on H2, 
> matching all documents when they shouldn't. Actually I should say 
> "word stem" because the buggy results all involve words starting in "rel". For example:
> 
> 		H1	H2
> religions*	   138	   138
> religion*	  2448	166618
> relig*		  3810	166618
> reli*		 14608	166618
> rel*		 39888   39888
> re*		150890	166618
> relia*		  1084	166618
> relie*		  8306	166618
> relo*		   156	166618
> relm*		     3	     3
> 
> I have tried unsuccesfully to find other letter sequences that exhibit 
> the bug in a wildcard search or that give different result counts for 
> H2. So far it's only certain "rel-" examples.
> 
> My next step will be a forced reindex of the database on H2 to see if 
> that helps, but before I do that I wonder if anyone has a clue what 
> might account for this behavior.
> 
> Even odder, on two entirely different systems running an entirely 
> different MarkLogic software instance, "rel-" searches are also 
> showing discrepancies, though I haven't researched that one as 
> thoroughly. Some deep-level indexing bug, possibly?
> 
> David
> 
> --
> David Sewell, Editorial and Technical Manager ROTUNDA, The University 
> of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA
> Email: dsewell at virginia.edu   Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


More information about the General mailing list