[MarkLogic Dev General] Reg: E-Node and D-Node configuration

Ron Hitchens ron at ronsoft.com
Tue Oct 29 12:39:38 PDT 2013


   You can think of a MarkLogic cluster as a single virtual
server.  A cluster is made up of nodes (E, D or E/D) but the
cluster should be thought of as an indivisible unit.

   D (data) nodes are MarkLogic processes that have forests
attached.  E (evaluator) nodes are those nodes which run
XQuery/XSLT requests on an appserver.  In a cluster, all nodes
share the same appserver configuration, so any node can be an
E node.  Typically, when configuring dedicated E and D nodes,
you configure things to send requests to only those nodes that
you want to act as E's, allowing the others to act only as D's.

   Communication between nodes in a cluster is basically this:

   For queries (read-only) no locks are needed (read up on MVCC).
Each search operation is fired in parallel to every D node
in the cluster (this is the "map" phase).  When the last D node
has responded, the E node can then merge the results (the "reduce").

   So, the lower the latency in communication between nodes, the
better the overall throughput.  You really don't want any slow
links between nodes in the cluster because it can slow down all
the E nodes.

   For update (write), cluster-wide locks must be obtained for
documents that are, or might be, updated.  All nodes in the cluster
must acknowledge the lock(s) before the update(s) can proceed.  This
basically means that updates can't happen faster than the slowest
responding node in the cluster.  Oh, and the locks need to be
released as well, via inter-node communication.

   Again, bad for overall performance when communication links
between nodes slow down, even with super-fast, beefy hardware.

   As Mike pointed out, clusters are not database replication.
You cluster to improve performance by spreading the immediate
work across multiple CPU and disks co-located together.  You
can add synchronous replication between nodes in a cluster to
provide for HA failover in the event a node fails.  This has a
latency cost, but makes the cluster more robust.  You replicate
databases asynchronously between clusters to provide for disaster
recovery if an entire cluster is lost or becomes unreachable.

   Hope that helps.

---
Ron Hitchens {ron at overstory.co.uk}  +44 7879 358212

On Oct 28, 2013, at 10:03 PM, Arindam3 B <arindam3.b at tcs.com> wrote:

> 
> Thanks Mike for the great walkthrough. Just trying to understand more on the xqdp protocol. Can you throw some light on how it operates between enodes n dnodes?
> 
> Thanks & Regards
> Arindam
> 
> -----Michael Blakeley <mike at blakeley.com> wrote: -----
> 
> =======================
> To: MarkLogic Developer Discussion <general at developer.marklogic.com>
> From: Michael Blakeley <mike at blakeley.com>
> Date: 10/28/2013 10:31PM 
> Subject: Re: [MarkLogic Dev General] Reg: E-Node and D-Node configuration
> =======================
>   Hosts within a cluster should have low-latency communications: gigabit ethernet or better. Ideally they should all be on the same switch and/or VLAN, with no router hops between hosts. If you try to set up a cluster across a WAN link you are likely to see poor performance and poor reliability. You might be trying to handle high availability (HA) and disaster recovery (DR) with a single cluster: that would be a mistake.
> 
> For high availability, use a single cluster with low-latency communications. Configure forest replication and host failover to provide the desired degree of protection against host failures. The docs at http://docs.marklogic.com/guide/cluster/failover talk about this as "local-disk failover".
> 
> For disaster recovery - scenarios where an entire data center goes offline - use database replication to a different cluster. This can use higher-latency communications, such as a WAN link. The docs at http://docs.marklogic.com/guide/database-replication describe this. The DR replica cluster can also implement local-disk failover to provide its own HA.
> 
> -- Mike
> 
> On 28 Oct 2013, at 06:41 , Arindam3 B <arindam3.b at tcs.com> wrote:
> 
>> Hi, 
>> 
>> I had a query regarding the E-Node and D-Node setup in Marklogic. 
>> 
>> In a distributed environment, if I plan to keep the Enodes and DNodes separately in different physical locations over the LAN or WAN (across geographies), what is the potential risk? 
>> How does failover work in that scenario? 
>> I have read that ENodes and DNodes communicate through XQDP protocol, so in this case will there be performance issues? 
>> 
>> Does Marklogic recommend having ENode and DNode cluster in the same physical box? 
>> If so, then across the network if we have a set of E-D-Nodes, how is the network latency reduced while synching the data during replication? 
>> 
>> If you can provide me with some information about XQDP protocol it would be great!! 
>> 
>> Thanks & Regards
>> Arindam Bose 
>> =====-----=====-----=====
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain 
>> confidential or privileged information. If you are 
>> not the intended recipient, any dissemination, use, 
>> review, distribution, printing or copying of the 
>> information contained in this e-mail message 
>> and/or attachments to it are strictly prohibited. If 
>> you have received this communication in error, 
>> please notify us by reply e-mail or telephone and 
>> immediately and permanently delete the message 
>> and any attachments. Thank you
>> 
>> 
>> _______________________________________________
>> General mailing list
>> General at developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General at developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general



More information about the General mailing list