[MarkLogic Dev General] Marklogic hosting options?

David Lee David.Lee at marklogic.com
Thu Jan 7 17:07:02 PST 2016

1) by 'storage' I presume you mean the main database storage ?  To do that would require using a network filesystem of some sort (HDFS , NFS, GFS etc) and a remote server cluster hosting the FS.
ML does support S3 for native storage but it does not support transaction journaling ( due to limitations on S3 ).

Depending on your network connectivity and 'devices' between you and AWS the performance will vary - but you cannot break the speed of light wrt  latency.   Without a local caching component I can't imagine a performant configuration.

Depending on the rationale and requirements for this, and your budget -- this is such a component:

It’s a 'appliance' like a SAN that caches to S3.     
I don’t know of any tests with this device but it has the necessary features.

2) Best left to the networking experts (wrt SAN).   My non-expert thoughts on this is that SAN is designed to be accessible by multiple nodes,
if you only have 1 node per SAN why not just directly connect the disks ?

3) "Same DataCenter" is less important than latency and bandwidth.
Nodes in a clusters perform better if they are 'close' to each other wrt to networking latency.

An example where this distinction matters is on AWS in a given 'Region'  (say us-east-1) there are 5 "Zones" comprised of > 10 'datacenters'
Network latency between nodes in different zones is less than the latency from the CPU to a local hard disk.  ( approx. 2ms ).
(general reference)

Our 'white paper' recommended architecture for AWS for a 3 node cluster is to have each node in a different zone in the same region.
This gives you good fault tolerance with minimal if any performance degradation.
Region to Region latency however is much higher -- limited by speed of light -- (and # of hops etc).  10ms-100ms or more

Here's one example (just google for 'aws region latency'

Between geographic regions ML I would recommend separate clusters using foreign replication.  The protocol is designed for larger latency connectivity.

4) -- ( not on list )
"On Premise" and "In Cloud"  isn't always orthogonal. You can provision your datacenters with high speed 'Direct Connect' connections to AWS and logically join the networks.   This allows you to view the whole system as one network and migrate workloads and services as you see fit.

-----Original Message-----
From: general-bounces at developer.marklogic.com [mailto:general-bounces at developer.marklogic.com] On Behalf Of Dennis Garlick
Sent: Thursday, January 07, 2016 4:58 PM
To: general at developer.marklogic.com
Subject: [MarkLogic Dev General] Marklogic hosting options?


Without going through all of the time and expense to test various options, I’m wondering what are the possible drawbacks (or even
feasibility) of using the following to host a Marklogic environment:

•       Is it feasible to use Amazon Web Services just for storage,
while the server is on premises (as opposed to having the server in the cloud as well)? I’m guessing this is possible, but would it really hurt performance?
•       If you have a 3-server Marklogic cluster, does it make sense for
them to connect to a single SAN storage, or should they each have their own SAN storage?
•       Is it feasible to have a cluster where nodes in the cluster are
located in different locations such as different states (assuming that data on one node will not be replicated on the other nodes)? Or would performance demands mean that the servers of a cluster should ideally (or preferably) reside in the same data center?


General mailing list
General at developer.marklogic.com
Manage your subscription at: 

More information about the General mailing list