If you are new to Azure and Virtual Private Networks (VPN), we suggest you start with the Getting Started with MarkLogic Data Hub Service on Azure tutorial. This tutorial assumes you are comfortable working with Azure and its products like Virtual Networks (VNets), Virtual Machines (VMs), etc.

When hosting your application in an Azure Virtual Network, having a private MarkLogic VNet with peering configured is recommended. This tutorial covers configuring a private MarkLogic Data Hub Service VNet, along with configuring the peering required to allow your Customer VNet to communicate with the provisioned MarkLogic Data Hub Service VNet.

You should have already created your Azure and MarkLogic cloud service accounts, as described in the previous installment.

Data Hub Service Architectural Overview

Here are the server and network resources that we want to establish:

Data Hub Service Architecture Overview

Figure 1: Server and network resources to be established

The “Customer VNet” will be managed by the customer, and the “Service VNet” is managed by MarkLogic. Systems accessing the data hub like front-end GUI, background applications, and LDAP servers, can be hosted in VMs in the Customer VNet. Since “Service VNet” is private, “VNet Peering” is required to allow communication between the “Customer VNet” and the “Service VNet”.

The Azure documentation has more information about Vnets and VNet Peering.

Refer to the Resource Checklist in the Appendix to help track the various pieces of information we will generate and use.

Create Customer VNet

The MarkLogic Data Hub Service requires the Customer VNet to be in the same region as the Service VNet. Accordingly, please take note of the supported regions for the MarkLogic Data Hub Service:

  • West US 2
  • Central US
  • East US
  • North Europe

If your company has its own policies for establishing VNets, take note of the following information:

Customer VNet Resource ID The unique ID of our virtual network
Customer VNet Address Space
This is the Network CIDR for your virtual network

Otherwise, we must create a Customer VNet. From your Azure home page, look for “Virtual Networks” (shown below) and click on “+ Add” to create a new VNet.

Finding Azure Virtual Network

The following information is required to create a VNet:

Subscription This is a billing group associated with your account. You may create your own subscription by looking for “subscription” and clicking on “+ Add” to create your own.
Resource Group A means to group resources together to easily identify the during billing.
Name The name of your Virtual Network
Region The region your VNet will reside in. Note that your VNet must be in the same region as your Data Hub Service VNet.

As seen in this screenshot:

Add Virtual Network Details

Figure 2: Details provided to create a VNet

Then click on “Next: IP Address”; at this point, you can update the rest of the settings as needed, but for the purposes of this tutorial, we will use the default settings provided by Azure. Note down the “IPv4 Address Space”, as we will be using it later.

IPv4 Address Space

Figure 3: Note the “address space” during Vnet Creation

Continue through to configure Security and Tags (per your organization’s requirements) until you reach the “Review + Create” section. Once you see the “Validation passed” notification, click on “Create”. Creation may take awhile.

You can also find the “address space” on the overview page of your VNet:

VNet Overview Page

Figure 4: VNet overview also contains “address space” information

In addition, note your VNet’s “Resource ID” from the “properties” section:

VNet Properties

Figure 5: VNet Properties, which contains the Resource ID

Create your Jumpbox

At this point you have a VNet without a Virtual Machine. In order to secure the VNet, we can enable only one VM to have connectivity over the internet; this is called the jumpbox or jump server. By using this VM, you can connect to other Azure VM’s using dynamic IP while preventing all Azure VMs to be exposed to the public. All VMs other than the jumpbox should have port 22 (for SSH) blocked off. Let’s create this jumpbox.

From your Azure home page, look for “Virtual Machines” and click on “+ Add” to start the process. As you proceed through the steps, note the following:

  • “Region” has to be on the same host as your Data Hub Service VNet
  • “Authentication” should use public key
  • “Public key” should match a prepared private key. While we do not cover key generation in this tutorial, you may want to start out with certificates, then convert those certificates to public and private key pairs.
  • “Public inbound ports” should “allow selected ports”
  • “Select inbound ports” should include SSH (port 22)

After clicking “Next: Disks”, we choose “Standard HDD” for this tutorial. After clicking “Next: Networking”, note the following:

  • Virtual Network should use Customer-VNet, which we created above.
  • “Public inbound ports” should “allow selected ports”
  • “Select inbound ports” should include SSH (port 22)

Proceed through the steps and update settings as necessary. Click on “Create” after you see the “Validation passed” notification. Once created, load the resource page and take note of the Public IP address as shown below:

VM Resource Page

Figure 6: VM resource page, which contains the Public IP address

Create Peered Data Hub Service Network

We now have our customer VNet configured along with the entry point (the jumpbox). You can populate the VNet per your application’s needs following company guidelines on access and security. Before we proceed with creating the peered Data Hub Service network, gather the information we have noted thus far.

In addition, locate your Tenant ID by looking for “Azure Active Directory” from your Azure home page and loading your current default directory.

Default Active Directory with Tenant ID

Figure 7: Tenant ID is found in the default active directory display

  1. Go to the Data Hub Service for Azure portal.
  2. Click on “Network” in the upper set of tabs
  3. Click on “Add Network”
  4. Provide a name for the network
  5. Be sure to choose the same region as your Customer VNet.
  6. Supply the “Tenant ID” and click on “Add to Active Directory” as shown below. This will open a new tab that will require you to login.
    Add to Active Directory using Tenant ID
  7. Go back to the Data Hub Service for Azure portal after signing in to Azure.
  8. Click on “Verify”
  9. Go back to your Azure portal and click on “Access Control”
  10. Click on “+ Add”
  11. Choose “Network Contributor” as role and “mlDataHubService” as “Select” (the “Add” button is at the bottom of the screen):
    Add Role Assignment
  12. Go back to the Data Hub Service for Azure portal and check the “I have completed the role assignment” box
  13. Input the Customer VNet Resource ID as VNet ID
  14. Provide the Customer VNet Resource Address as VNet CIDR

Your form should look something like this:

Configure Network Example

Figure 8: Example form for configuring Data Hub Service Network for private VNet

Click on “Configure” to start the process of network creation. This can take a while to provision, but eventually you would end up with an entry like the following:

Network Peered

At this point we have successfully created our Service VNet. Take note of the “Peering Connection ID” and the “Network CIDR” above as DHS Network CIDR. We will be using them later.

Create the Data Hub Service Instance

On the MarkLogic Cloud Services Azure homepage, click on the “+ Data Hub Service” tab to create an instance, make sure to select “Private Access,” and supply the following information:

Create Data Hub Service

Figure 9: Create Data Hub Service

Clicking on “Create” will spawn the MarkLogic VNet as described in Data Hub Service Architectural Overview. This can take around ten minutes or so. You can hit the “refresh” icon on the upper left to get updates periodically.

Data Hub Service Running

Figure 10: Successful DHS Provisioning

Now you can configure your users similar to how we did it for the public Data Hub Service instance.

Set Up Tunneling

The provisioned MarkLogic cluster can only be accessed via the load balancers, which can only be accessed via the “Customer VNet”. In addition, our “Customer VNet” can only be accessed via the jumpbox or via the Azure portal. If you prefer to work directly from your local environment using your own browsers to access the Data Hub Service endpoints, then you will need to set up tunneling. To load our modules via gradle or push data via MLCP or DMSDK, we either execute them on the jumpbox or locally through a tunnel setup. Here, we walk through tunneling setup.

Windows using PuTTY
  1. For Host Name, supply your jumpbox Public IP address.
  2. Under Connection >> Data, specify the username you used to create your jumpbox as “Auto-login username”
  3. Under Connection >> SSH >> Auth, click on browse to pick up the corresponding private key of the public key you used to configure your jumpbox.
  4. In your Data Hub Service portal, click on “Action” >> “Tunneling script” and the “copy” icon to copy the script. Paste the contents into an editor like Notepad.
    Tunneling Script
  5. Under Connection >> SSH >> Tunnels, add the entries that correspond to the above list. Note that the servers would be different in a “standard” Data Hub Service instance; here, we chose “Low Priority”.
  6. Under session, supply the name under “Saved Session” and click “Save” for future reuse.
  7. Click “Open” to start tunneling.
Mac / Linux using SSH
  1. Click on the “SSH Tunneling Script” link in your Data Hub Service page as shown below:
  2. Click on the “copy” icon and paste the contents into an editor like Notepad.
  3. Replace $SSH_PEM_KEY with /path/to/my-cert.pem
  4. Replace VM_USER with the username you used to create your jumpbox.
  5. Replace BASTION_VM with the your jumpbox Public IP address

You could run the text directly or save it to a file and run using sh /path/to/file.txt.

Developers with existing installation of MarkLogic will notice the potential conflict with use of default Data Hub Service ports like 8010. To avoid the conflict, use another port number for the tunnel or change the port values in your local MarkLogic installation by updating gradle-local.properties.

You should now be able to access the manage page using localhost:8003. Please refer to the previous installment regarding deployment instructions of your data hub application.

If you run into issues using MarkLogic Data Hub Service, contact Support. MarkLogic engineers and enthusiasts are also active in Stack Overflow, just tag your questions as ‘marklogic’.

Appendix: Resource Checklist

This table can be used to keep track of what we need for each stage of configuration.

Field Requires Example Value Your Value
Region
East US
Azure Tenant ID cf66b2f-xxxxx
Customer VNet Resource ID /subscriptions/xxxx
Customer VNet IPv4 Address Space 10.0.0.0/16
Jumpbox Public IP Address
Customer VNet
Jumpbox VM
52.170.214.210
Peering Connection ID Customer VNet Resource ID
Customer VNet IPv4 Address Space
Region
Acure Tenant ID
mlaas-b5a1c3ca…
Data Hub Service Network CIDR Customer VNet Resource ID
Customer VNet IPv4 Address Space
Region
Acure Tenant ID
10.100.16.0/22

Learn More

Data Hub Service on Azure

This tutorial gets users new to Azure up and running quickly by focusing on the specific components you need to get started.

CloudServices

Find out what Data Hub Service is, the prerequisites for it, and how to get started using DHS.

Data Hub Framework

Learn what the Data Hub Framework is, why you need it, how to get started with it, and where to send your questions around it.

This website uses cookies.

By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.