[MarkLogic Dev General] Datase size, Comunity edition, EC2 &
Restore and reindexing question - Restore Stalled
Danny.Sokolsky at marklogic.com
Mon Jan 18 12:03:02 PST 2010
XQsync is your friend for moving from one platform to another:
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Lee, David
Sent: Monday, January 18, 2010 11:51 AM
To: Michael Blakeley; General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Datase size, Comunity edition, EC2 & Restore and reindexing question - Restore Stalled
I suspect the platform issues are what I'm running into.
A suggested "Feature Request" would be to report some kind of error on this ...
Any suggestions for moving 25 GB of XML + binary data from one ML system to another (of different platform) ?? I'm considering having to do it document-at-a-time using an XCC based tool then using the "Load" option in the control panel.
It would be nice if there was the opposite of "Load" as a GUI or API where I could "save a ML directory tree to the local filesystem". I understand things like collections and document properties would be lost.
From: Michael Blakeley [mailto:michael.blakeley at marklogic.com]
Sent: Monday, January 18, 2010 2:41 PM
To: General Mark Logic Developer Discussion
Cc: Lee, David
Subject: Re: [MarkLogic Dev General] Datase size, Comunity edition, EC2 & Restore and reindexing question - Restore Stalled
Backups are platform-specific. So a backup from a Windows 32-bit instance won't restore on a 64-bit Windows instance, or even a 32-bit Linux instance. It's not clear to me whether that is the problem or not.
I have never tried reindexing with an over-capacity license, but I wouldn't expect it to work. Every update should fail, because updates aren't allowed when the license is over capacity. Possibly that's what you are seeing? Have you looked at the error log?
For whatever it's worth, I have used 4.1-4 on EC2 with good results. I used the rightscale CentOS 5.2 64-bit linux image.
On 2010-01-18 11:35, Lee, David wrote:
> I eventually gave up on this on both EC2 and a desktop .
> The restore had run for 3 solid days and was making no progress at all. CPU is pegged but absolutely no progress in the status, and IO writes and reads are 0 !
> My hunch was this has something to do with trying to restore into a comunity edition ML server with too much data.
> So I wiped my target machine (desktop) applied a trial license, and am repeating the restore.
> Same thing. Its been running for hours now and has "stalled" ... no progress begin made, but CPU is pegged, Memory use goes up and down around 800MB but absolutely 0 IO reads or writes.
> It seems "stuck" at this
> Removing old fragment root/parent configuration
> My question ... is the concept even valid ? Is it expected that a backup from one system can be restored onto another ?
> If not what is the recommend way to move data from one ML server to another ?
> From: general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] On Behalf Of Lee,
> Sent: Sunday, January 17, 2010 8:54 AM
> To: General at developer.marklogic.com
> Subject: [MarkLogic Dev General] Datase size, Comunity edition,EC2&
> Restore and reindexing question
> I noticed recently that ML is supported on Amazon EC2, this Is an exciting possibility.
> As an experiment to see if I can get my experimental database to run on EC2 I am trying to load it into a Community edition license EC2 instance (have yet to get approval to purchase a "Standard License" for EC2).
> I have a few questions .
> 1) License size restrictions.
> Prior editions before 4.1.4 I noticed that license size was related to "Index Size" .. or atleast that’s how it seemed.
> The same size XML would use different % of my license depending on what indexing options I selected.
> It doesn’t appear to do that anymore ... Is community edition licensed based on content or index size or both ?
> That is, is it possible to decrease the size for licensing purposes by turning off various search features ?
> 2) Backup/Restore
> I tried first to load the data from XML directly using my program that uses XCC. I have about 26 GB of XML data
> across 10 different data sources (and several million documents). Loading with XML directly is cumbersome, and I ran into lots of problems trying to load it to EC2, network problems would cause a batch load to abort and I'd have to start over.
> So instead I tried doing Backup of my master DB, then rsyncing the backup to EC2 then doing a restore.
> I also tried the same thing to a desktop (local) ML community edition license server just to test.
> This worked fine ... Except I know I have too much data (26G) so I turned off many of the search options like 2 letter searches etc. On both my desktop and my EC2 instance after restoring the 26G from disk ML went into a re-indexing and refragmenting phase (expected) ... and is showing 254% above license use (also expected).
> What was NOT expected is that 48 hours later it's still reindexing with no end in sight.
> ML is pegging the CPU and predicts it will be done in 5-10 minutes ... for 2 days now.
> Both in my EC2 and desktop instances .. so I know this is not just an EC2 issue.
> I want to let this run to completion to see if I can get the data set under community edition size, so I can at least prove the concept to my manager and try to justify the concept of an EC2 ML server.
> But why is it taking so long ? Is this expected ? Does it have anything to do with the license ?
> That is, because I'm over license size is it just going to run forever ? or will it eventually complete?
> Any clues on how long this is going to take ?
> If this has nothing to do with licensing, this is a huge problem if it can take days to recover from a restore ...
> what would happen in production if I had to restore from backup or transfer data from one server to another ? The server would be offline for days ?
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> dlee at epocrates.com<mailto:dlee at epocrates.com>
More information about the General