[MarkLogic Dev General] "warming" indexes

Alan Darnell alan.darnell at utoronto.ca
Fri Mar 28 07:33:47 PST 2008


Mike,

Thanks for this.

I can say for sure that there are no inserts and updates happening  
during our testing, but it is good to know that that is something to  
watch for.

Your second point is more likely what's happening but we have been  
monitoring processes while doing the searching test and the only  
things of any significance that seem to be happening on the system  
while we are not searching it are related to gfs activity and maybe  
nagios monitoring.  It's really baffling.  We'll take a look at your  
link re: Linux -- we are running on 64 bit RedHat Linux.

Thanks,

Alan


On 28-Mar-08, at 11:20 AM, Michael Blakeley wrote:

> Alan,
>
> I can think of two possibilities that you might want to explore:
>
> a) In between searches, documents are being inserted or updated.  
> These changes can invalidate existing cache entries, so that a new  
> query will have to update the cache entries. This would mean that  
> the cluster isn't really idle, of course.
>
> b) Another process is causing significant I/O, and the OS is paging  
> out MarkLogic's index and cache pages to make room (probably for  
> buffer-cache pages). This is a well-known issue with Linux, for  
> example (see http://kerneltrap.org/node/3000 for some discussion).  
> One could determine which process is causing the paging, and disable  
> it (cron jobs are likely candidates). One could also tune down the  
> VM swappiness, per the kerneltrap link.
>
> -- Mike
>
> Alan Darnell wrote:
>> We have recently moved from a single host Mark Logic server to a   
>> cluster with 4 data nodes and 2 evaluator nodes.  We also  
>> increased  the number of documents in our primary database from 1  
>> million to 13.5  million.   When we search this cluster (either via  
>> CQ or an XQuery  application we've built), we notice the following  
>> behaviour.  If the  cluster has been sitting idle for a few  
>> minutes, a first search will  take up to 20 seconds to respond.   
>> Subsequent searches on the same  term or another term take a second  
>> or less to respond.   Leave the  system alone for a few minutes and  
>> then run the same searches --  again, the first search takes about  
>> 20 seconds and subsequent searches  are fast.
>> I'm not too worried about this behaviour because when we are in   
>> production the system shouldn't be idle very often.  But it does  
>> make  me wonder why this is happening on an idle system.  I realize  
>> that the  subsequent searches are faster because data from the  
>> indexes has been  moved from disk to memory.  But why doesn't this  
>> data stay in memory  -- what flushes it out and is there any way to  
>> keep this data in  memory?  Do other sites see this same  
>> behaviour?  How do they deal  with it? Do we need to "warm" the  
>> indexes periodically by running  searches against them?
>> Alan Darnell
>> University of Toronto
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general



More information about the General mailing list