Imagine: You are a novice MarkLogic user taking a look at MarkLogic Server for the first time. You plan to build a sample application to show your co-workers what MarkLogic can do, and you need to get your files loaded into MarkLogic Server as quickly as possible. You've got a folder containing a bunch of XML or JSON files; you need to load them into MarkLogic Server as quickly as possible.
Here's the 5-minute challenge: Can you go from a folder of files on the machine where your server is running to searchable content in 5 minutes? Yes, you can using MarkLogic Server. This guide will walk you through the steps needed to load your content using Information Studio. Let’s start by opening up your web browser. Visit http://localhost:8000/appservices/ (or http://localhost:8002/ in MarkLogic Server 4.2).
You will see the main Application Services page, where you can create or configure a database, create an application using Application Builder or create a "flow" to configure a load and transformation process using Information Studio. Within Information Studio, documents flow through three processing phases: collection, transformation, and loading. Let's create a new flow to load our content.
You will now see the Information Studio Flow Editor. A “flow” is made of three components: collect, transform, and load.
You can do this by editing the default name. I have changed the default flow name from “Untitled-1” to “Load sample content”.
By default, the “Filesystem Directory” collector will be used. To specify a source directory on the filesystem where your content is located, click on the “Configure…” button. Enter the full path of the directory on the “Configure settings” window. Click on “Done” once complete.
Clicking on the “Ingestion …” button brings up the Ingestion Settings window that allow you to update several aspects of the flow, for example, the number of documents that you want processed per transaction or the filter that should be applied to the documents being loaded. For now, let’s use the default ingestion settings.
Transformers are plugins that modify your documents as they are loaded into the database. The “transform” step is optional when creating a flow. You can select a transformer by clicking on “+ Add Transformation Step”.
MarkLogic ships with eight transformers as seen below. The XQuery transformer allows you to add custom XQuery to transform documents as they are being loaded. Similarly, using the XSLT transformer, you can define a custom XSLT style sheet.
If you're loading JSON documents, you'll want to pick the "JSON" transformer (giving it an optional name). If you're loading XML, you can skip the transformation step and move on to the load step.
The load step allows you to specify the database into which you want your content loaded. Use the drop-down box to select a database. We will use the “Documents” database to load content.
In addition to database selection, you can also optionally configure the URIs and metadata that will be assigned to the documents that are loaded. For example, the URI structure under which the documents are to be loaded or the collections under which the documents are to be grouped. Click on the “Document settings” button to configure this information.
You are now ready to load your documents from the source directory to the target “Documents” database. Just click the “Start Loading” button to execute your “Load sample content” flow.
The Status section on the Flow Editor page displays information about all the runs of a flow. When the “Start Loading” button is clicked, a ticket number gets assigned to the run. You can monitor the progress of a ticket and manage all tickets that belong to a flow in the Status section as well.
The “Status” will change from “Loading” to “completed” once the run is complete. The number of documents loaded will also be displayed along with the number of errors if any errors were encountered. You should now have loaded all your documents into the “Documents” database. Wasn’t that easy? You are now ready to build your application using the loaded content.
1. Just in case you selected the wrong folder or missed setting an attribute, the “Unload” button provides a quick and easy way to delete documents that were loaded by the corresponding ticket. Once unload is complete, the “Status” will change to “unloaded”
2. If you now re-visit the Application Services page at http://localhost:8000/appservices/ (or http://localhost:8002/ in MarkLogic Server 4.2), you will see the “Load sample content” flow we just created listed in the flows table along with all the details of the last run.
Continue to watch the Learn page for additional tutorials to help you build powerful applications for unstructured information using MarkLogic.
For more details on Information Studio, see the guide.