[MarkLogic Dev General] "warming" indexes

Michael Blakeley michael.blakeley at marklogic.com
Fri Mar 28 07:20:56 PST 2008


Alan,

I can think of two possibilities that you might want to explore:

a) In between searches, documents are being inserted or updated. These 
changes can invalidate existing cache entries, so that a new query will 
have to update the cache entries. This would mean that the cluster isn't 
really idle, of course.

b) Another process is causing significant I/O, and the OS is paging out 
MarkLogic's index and cache pages to make room (probably for 
buffer-cache pages). This is a well-known issue with Linux, for example 
(see http://kerneltrap.org/node/3000 for some discussion). One could 
determine which process is causing the paging, and disable it (cron jobs 
are likely candidates). One could also tune down the VM swappiness, per 
the kerneltrap link.

-- Mike

Alan Darnell wrote:
> We have recently moved from a single host Mark Logic server to a  
> cluster with 4 data nodes and 2 evaluator nodes.  We also increased  
> the number of documents in our primary database from 1 million to 13.5  
> million.   When we search this cluster (either via CQ or an XQuery  
> application we've built), we notice the following behaviour.  If the  
> cluster has been sitting idle for a few minutes, a first search will  
> take up to 20 seconds to respond.  Subsequent searches on the same  
> term or another term take a second or less to respond.   Leave the  
> system alone for a few minutes and then run the same searches --  
> again, the first search takes about 20 seconds and subsequent searches  
> are fast.
> 
> I'm not too worried about this behaviour because when we are in  
> production the system shouldn't be idle very often.  But it does make  
> me wonder why this is happening on an idle system.  I realize that the  
> subsequent searches are faster because data from the indexes has been  
> moved from disk to memory.  But why doesn't this data stay in memory  
> -- what flushes it out and is there any way to keep this data in  
> memory?  Do other sites see this same behaviour?  How do they deal  
> with it? Do we need to "warm" the indexes periodically by running  
> searches against them?
> 
> 
> Alan Darnell
> University of Toronto
> 
> 
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general



More information about the General mailing list