If you are new to AWS and Virtual Private Clouds (VPC), we suggest you start with the Getting Started with MarkLogic Data Hub Service on AWS tutorial. This tutorial assumes you are comfortable working with AWS and VPCs.
When hosting your application in an AWS VPC, then having a private MarkLogic VPC with peering configured is recommended. This tutorial covers configuring a private MarkLogic Data Hub Service VPC, along with configuring the peering required to allow your VPC to communicate with the provisioned MarkLogic VPC.
You should have already created your Amazon and MarkLogic cloud service accounts, as described in the previous installment.
Here are the server and network resources that we want to establish:
Figure 1: Server and network resources to be established
The “Customer VPC” will be managed by the customer, and the “Service VPC” is managed by MarkLogic. Systems accessing the data hub like front-end GUI, background applications, and LDAP servers, can be hosted on servers in the Customer VPC. Since “Service VPC” is private, “VPC Peering” is required to allow communication between the “Customer VPC” and the “Service VPC”.
Refer to the Resource Checklist in the Appendix to help track the various pieces of information we will generate and use.
The MarkLogic Data Hub Service requires the Customer VPC to be in the same region as the Service VPC. Accordingly, please take note of the supported regions for the MarkLogic Data Hub Service:
|Region||# of EC2 Availability Zones||Identifier|
|US West (Oregon)||3||us-west-2|
|US East (N. Virginia)||6||us-east-1|
|EU West (Ireland)||3||eu-west-1|
|EU West (London)||3||eu-west-2|
|EU Central Frankfurt||3||eu-central-1|
|Asia Pacifica (Sydney)||3||ap-southeast-2|
If your company has its own policies for establishing VPCs, take note of the following information:
|VPC ID||If you used a template to launch a VPC, use the VPC ID that gets generated.|
|AWS Account ID||https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html|
|User Subnet CIDRs||Can have from 1 to 6 User Subnets CIDRs. If you are running your clients or application servers across three AWS zones, you must provide all of the subnet CIDRs. If your service is running in a single zone, just provide one CIDR.|
Otherwise, we must create a private VPC with a bastion host as the single access point.
There are a few things that you should take note of now and use later. We will specify them here for you down below. Firstly, take note of the BastionHostIP by clicking on the “Physical ID” link in the screenshot above. The IP is shown below in the red circle:
Figure 4: BastionHostIP
Take note of what is considered as your public and private subnet route table:
Figure 5: Public and private subnet route table
Clicking on “VPCs” will display the VPC IDs; note the VPC IDs which we will use later:
Figure 6: VPC IDs
Take note of the public and private CIDRs used to execute this stack:
Figure 7: Public and private CIDRs
Alternatively, the CIDRs can be picked up from the “Subnets” from amazon console:
To obtain your MarkLogic Service ID, which we will be using later, go to the MarkLogic Cloud Service homepage and click on your name in the upper right hand corner of the page:
Figure 8: MarkLogic Service ID
As shown in the diagram in Figure 1, we need to allow the “Customer VPC” to communicate with the MarkLogic VPC by creating a peer role. More information about VPC peering is available in the AWS VPC Peering Documentation.
This can take a while to complete, but you should eventually have something like the following at the end. Note the RoleARN in the “Outputs” tab; we will be using it later.
Figure 9: CloudFormation stack details with RoleARN in the Outputs tab
This may take a while to complete… Make sure to hit the refresh button on the right every now and then. Eventually, you will see that network configuration has completed. Take note of the Peering Connection ID, e.g. pcx-079d5f1a12c607814. Additionally, take note of the public and private CIDRs generated. We would be using these later.
Figure 11: Network configuration completion status with Peering Connection ID and public and private CIDRs generated
On the MarkLogic Cloud Services homepage, click on the “+ Data Hub Service” tab to create an instance, make sure to select “Private Access,” and supply the following information:
Figure 12: Create Data Hub Service
Clicking on “Create” will spawn the MarkLogic VPC as described in Data Hub Service Architectural Overview. This can take around ten minutes or so. You can hit the “refresh” icon on the upper left to get updates periodically until you get something like the following:
Figure 13: Users and roles for the Data Hub Service
At this point, our “Customer VPC” and “Service VPC” are up and running. We have our peering role set up to allow communication between these two VPCs. But they do not know how to find each other. We need to configure routing tables to allow our “Customer VPC” to know the IP addresses of our “Service VPC”. Learn more about VPC Routing with these resources: Updating Your Route Tables for a VPC Peering Connection and Route Tables.
You can now access your DHS instance from or through your bastion server. Additional ssh user accounts will be needed to map to our developers that would connect to/tunnel through the bastion server to deploy the modules and other MarkLogic configuration. Note that these ssh user accounts are different from the Cloud Service accounts, AWS accounts and DHS accounts. Please do not share the ec2-user certificate to your peers.
Recall how we created the “Customer VPC” and bastion server in the Create Customer VPC section. If we stop here, the only way to access the data hub we just set up is via the bastion server. This is because the provisioned MarkLogic cluster can only be accessed via the load balancers, which can only be accessed via the “Customer VPC”, which can only be accessed via the bastion server.
However, if you prefer to work directly from your local environment using your own browsers to access the Data Hub Service endpoints, then you will need to set up tunneling. To load our modules via gradle or to push data via MLCP or DMSDK, we either execute them at the bastion server, or locally through a tunnel setup. Here, we walk through tunneling setup.
/path/to/my-us-west-2-key-pair.pem(certificate file generated in step #4 of the Create Customer VPC section).
You could run the text directly or save it to a file and run using
Developers with existing installation of MarkLogic will notice the potential conflict with use of port 8002. To avoid the conflict, use another port number for the tunnel or change the “Manage” port in your local MarkLogic installation. The rest of this guide assumes that 8002 will be used by our tunnel.
You should now be able to access the Configuration Manager using localhost:8002 as shown below:
Please refer to the previous installment regarding deployment instructions of your data hub application.
This table can be used to keep track of what we need for each stage of configuration. This is particularly useful for the System Administrator given the number of actions to be undertaken.
|Field||Requires||Example Value||Your Value|
|AWS Account||Email address
|AWS Account ID||893017339836|
|AWS Certificate||AWS Account||my-dhs-key-pair.pem|
|VPC Public and Private Subnet CIDRs||10.0.0.0/23
|VPC ID||VPC CIDR
VPC Public and Private Subnet CIDRs
|Bastion Host IP||126.96.36.199|
|Public Route Table ID||rtb-034d205a3c9a8fcc7|
|Private Route Table ID||rtb-0ff979af79d8d874b|
|MarkLogic Service ID||MarkLogic Cloud Services subscription in AWS Marketplace
MarkLogic Cloud Service Page signup
MarkLogic Service ID
|Peering Connection ID||VPC ID
AWS Account ID
Peer Role ARN
VPC Public and Private Subnet CIDRs