[MarkLogic Dev General] MarkLogic "rsync" command - RE: Mac Webdav Client setting xqyfilesasbinary
Wayne Feick
wayne.feick at marklogic.com
Tue Jun 15 16:30:07 PDT 2010
No, Flexible Replication is about syncing documents from one database to
another, so it's not what you want.
What you're describing sounds very much like an incremental backup,
which I've typically seen solved by either timestamp comparison to the
time of the last backup or by setting a platform/filesystem specific
"archived" flag on a file that is subsequently cleared if the file is
modified. Windows has an "archived" flag, but Linux/Unix don't.
I don't know if that helps you or not.
Wayne.
On 06/15/2010 04:09 PM, Lee, David wrote:
>
> What I'm looking for is to take a directory of files on a local
> filesystem and "optimally" push them to a MarkLogic server matching
> the directory structure.
>
> By "optimally" I mean not pushing thousands of files if they havent
> changed since the last time.
>
> Useful for me in 2 cases
>
> 1) Making a change to 1 or 2 module file of a hundred and scripting a
> "push" process that takes a second insetead of a minute.
>
> 2) Updating say 100 files of 1 million from the filesystem to ML but
> I dont know which 100 without comparing to whats on the server.
>
> Does "Flexible Replication" do any of this ? From the title I'm
> guessing its ML to ML not Filesystem to ML.
>
> But thats just guessing :)
>
> *From:* general-bounces at developer.marklogic.com
> [mailto:general-bounces at developer.marklogic.com] *On Behalf Of *Wayne
> Feick
> *Sent:* Tuesday, June 15, 2010 6:41 PM
> *To:* general at developer.marklogic.com
> *Subject:* Re: [MarkLogic Dev General] MarkLogic "rsync" command - RE:
> Mac Webdav Client setting xqyfilesasbinary
>
> I gave a talk at the User Conference that covered Flexible Replication
> in our upcoming 4.2 release. This may do a lot of what you want...
>
> Wayne.
>
>
> On 06/15/2010 09:05 AM, Lee, David wrote:
>
> I've been thinking of going ahead and prototypeing this. That is a
> marklogic "rsync" type command.
>
> From my experimentation the way I think would work best is as
> described below (included email thread)
>
> That is to set a property on all files which includes the md5 and
> length (file length in bytes prior to uploading to ML).
>
> Then using client side logic compare the new list of files to whats on
> ML and generate a set of update/insert/delete commands.
>
> I've already done this for a special case and it worked well, so
> thinking of cleaning up the code and making it general purpose.
>
> Although my purposes are for updateing ML ... there's no reason the
> reverse couldnt also be done, to update with minimal operations a
> local filesystem.
>
> The questions I have are :
>
> 1) Would anyone be interested in this ?
>
> 2) How 'offensive' is storing a property on documents ? Would this be
> a 'deal killer' ? Should it be in a private namespace ?
>
> 3) How efficient is storing properties ? Does having to
> read,store,update properties negate any time savings from avoiding the
> load ?
>
> That is, I suspect for some size documents is actually faster just to
> push them unconditionally rather then have to look at properties and
> calculate MD5 sums to decide if to push ...
>
> 4) I could avoid properties entirely by calculating the MD5 and length
> on the fly in ML ... however I believe both require serializing the
> document in memory in ML. The xdmp:md5() takes a string, not a
> document. And there is no actual size method, that also requires
> serializing the document.
>
> The only way I can think of is to use xdmp:quote(doc(...)) then
> calculate the length and md5 on the server. My gut feeling is that
> doing this is a very heavy weight operation on large files and would
> be less efficient then just unconditionally pushing the document
> (except maybe on very very slow networks).
>
> Also I'm not sure (and I am highly suspicious its NOT true) that an
> MD5 calculated on a file on local disk wont match xdmp:md5(
> xdmp:quote(doc(...))) for the same file due to serialization
> differences. Same with length . Thus making this strategy pointless.
>
> -David
>
> *From:* general-bounces at developer.marklogic.com
> <mailto:general-bounces at developer.marklogic.com>
> [mailto:general-bounces at developer.marklogic.com] *On Behalf Of *Lee, David
> *Sent:* Friday, June 11, 2010 10:00 AM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Mac Webdav Client setting
> xqyfilesasbinary
>
> I would LOVE help with this project. (And yes I just checked in an
> update a half hour ago ... hate to point people at old code :)
>
> I've been thinking of exactly what your saying. The only thing
> stopping me besides time ... is I haven't figured out how to
>
> make sure the clocks are in sync and what the failure cases are if
> they are not.
>
> What I've done in another project is to use an MD5 checksum. There
> is a undocumented (its experimental) flag to put which adds a property
> with a MD5 checksum. xmlsh has a MD5 sum command
> (http://www.xmlsh.org/CommandXmd5sum).
>
> I generate a list of all documents with the MD5 sum, compare against
> local disk then update only changed files, propagating deletes,
> inserts, and updates. It worked great for one project ... but I have
> not generalized this code yet ...
>
> I'm reluctant to blindly add properties to 'other peoples files' so I
> havent made this into a general utility yet.
>
> Discussion greatly welcome ! (and help too ... )
>
> -David
>
> ----------------------------------------
>
> David A. Lee
>
> Senior Principal Software Engineer
>
> Epocrates, Inc.
>
> dlee at epocrates.com <mailto:dlee at epocrates.com>
>
> 812-482-5224
>
> *From:* general-bounces at developer.marklogic.com
> <mailto:general-bounces at developer.marklogic.com>
> [mailto:general-bounces at developer.marklogic.com] *On Behalf Of *Mike
> Brevoort
> *Sent:* Friday, June 11, 2010 9:43 AM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] Mac Webdav Client setting xqy
> filesasbinary
>
> Thanks David, That looks really cool.
>
> I was just looking at the code (that I've seen you are actively
> working on- checkins the last several minutes :) )and it seems like
> it wouldn't be too hard to create a a sync option for rsync like
> behavior (simpler obviously). If given a source (filesystem) and
> destination (marklogic DB directory) and depth (how far to recurse),
> we should be able to grab a list of all of the files on the server,
> their content-length and last updated dateTime. Then we could compare
> on the source filesystem for new/deleted and by size and date updated
> to decide which files to get and put.
>
> What do you think of that approach? I or someone on my team might be
> willing to take a crack at this.
>
> Also, what's required for others to run xmlsh on windows?
>
> Thanks!
>
> Mike
>
> On Fri, Jun 11, 2010 at 6:19 AM, Lee, David <dlee at epocrates.com
> <mailto:dlee at epocrates.com>> wrote:
>
> You might want to consider the MarkLogic extension to xmlsh
>
> http://www.xmlsh.org/ModuleMarkLogic
>
> This includes a "put" command which works similary to rsync (not quite
> as good as it doesnt handle minimal updates yet ... TBD)
>
> http://www.xmlsh.org/MarkLogicPut
>
> But I use it for scripting updates to modules. It uses XDBC (XCC) not
> WebDav. You can set the file type explicitly (-t for text).
>
> Or it uses the server default logic.
>
> Its not as powerful as recordloader but its easier to use.
>
> Example: I use this command to recursively copy my source .xquery file
> tree to the modules DB
>
> ml:put -r -baseuri /App/ -maxfiles 10 -maxthreads 3 *
>
> *From:* general-bounces at developer.marklogic.com
> <mailto:general-bounces at developer.marklogic.com>
> [mailto:general-bounces at developer.marklogic.com
> <mailto:general-bounces at developer.marklogic.com>] *On Behalf Of *Mike
> Brevoort
> *Sent:* Friday, June 11, 2010 12:20 AM
> *To:* general at developer.marklogic.com
> <mailto:general at developer.marklogic.com>
> *Subject:* [MarkLogic Dev General] Mac Webdav Client setting xqy files
> asbinary
>
> Hi,
>
> So I know that webdav clients always seem to have quirks and I've
> heard hearsay that the Mac webdav client has some problems when
> interfacing with MarkLogic, but....
>
> I have a modules database mounted via webdav on a mac. When I copy in
> an xquey file (test.xqy) via the native webdav client the content type
> of the file is being set to "binary" but if I use Cyberduck to move
> the file, it's being set to "text". When the type is set to binary, it
> fails to execute
>
> <h1>500 Internal Server Error</h1>
>
> <dl>
>
> <dt> [1.0-ml]</dt>
>
> <dd>XDMP-TEXTNODE: /ctd/article.xqy -- Server unable to build program
> from non-text document</dd>
>
> <dt>in /poc/article.xqy, on line 13 [1.0-ml]</dt>
>
> <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:getFields()</dd>
>
> <dt>in /poc/article.xqy, on line 15 [1.0-ml]</dt>
>
> <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:get()</dd>
>
> <dt>in /poc/article.xqy, on line 19 [1.0-ml]</dt>
>
> <dd>XDMP-UNDFUN: (err:XPST0017) Undefined function
> comoms-article:post()</dd>
>
> </dl>
>
> So two questions, anything I can do to affect how the Mac
> client/MarkLogic deal with document types? Or if not, how can I
> convert the document type via xquery? I'd really like to have the
> modules database mountable so that I can use tools like rsync to move
> files (vs a client like Cyberduck).
>
> Thanks!
>
> Mike
>
>
> _______________________________________________
> General mailing list
> General at developer.marklogic.com <mailto:General at developer.marklogic.com>
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
>
> --
> Mike Brevoort / Enterprise Web Practice Manager / Avalon Consulting
> LLC / 303-834-7509 / twitter:mbrevoort
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://developer.marklogic.com/pipermail/general/attachments/20100615/8c4a601f/attachment-0001.html
More information about the General
mailing list