As referenced in A Tale of Two Facets this write-up demonstrates how simple it is to develop and deploy a non-trivial faceted search and discovery application using MarkLogic's Application Builder. In less than 30 minutes, you can ingest data into an ACID compliant NoSQL database with government grade security. No up-front schema design or application-limiting assumptions are required. Readers can compare this approach to what’s described in “Faceted Search with MongoDB”.
We’ll build our faceted application using a database of “top-songs,” instead of books. Our songs are in XML format, however MarkLogic can identify facets in JSON, delimited text and a variety of other formats including RDBMS exports just as easily. MarkLogic is schema agnostic and performs indexing on ingestion.
A quick scan of one of the XML data files allows us to pick facets such as “artist,” “week (released)” and “genre.“ Developers can easily add facets for “writers,” “producers,” “song length” or any facet-able business object, even those that become known at a later date.
Let’s begin by setting up the target database in MarkLogic. This step wasn’t covered in the mongoDB article and is illustrated here for completeness. After navigating to localhost:8000, the developer will
- Select the “Information Studio” tab
- Click the “+New Database” button
- Enter a database name such as mysongs in the text filed
- Click the “Create Database” button
- Click the “+New Application” button
After clicking the “+New Application” button, the developer is taken to the screen below, where, after ensuring that mysongs appears in the “Target database” dropdown, he can provide an “Application Name” of say MyFavoriteSongs. The “Create Application” button is clicked.
At this point the developer has access to a series of tabs enabling him to customize and deploy the faceted search application. For now, we’ll select the “Deploy” tab, accept the defaults and deploy an application shell to port 7778 on the localhost. The “Deploy” screen and the resulting application shell are shown below.
As you can see, with just a few clicks, we’ve deployed key elements for a faceted search and discovery application. A facet container appears on the left, a search bar on top, widgets containers are below the search bar, and a results panel appears below the widget containers.
Ingesting the song catalog data using the open source MarkLogic Content Pump (mlcp) command line tool is a simple process. The command can be issued from any linux, Mac OS X or Windows prompt. Linux will be used in this example:
$mlcp.sh –options_file options.txt
Following are the contents of the options.txt file:
IMPORT -host localhost -port 8041 -username admin -password admin -input_file_path /Users/mmalgeri/Documents/workspace/demos/top-songs/songs -input_file_type XML -output_uri_prefix "/topsongs/" -output_uri_suffix ".xml" -output_collections "songs" -filename_as_collection true
Note that the port specified is 8041. This is the location of an instance-wide MarkLogic service that listens for data loading requests, into ANY database hosted on the MarkLogic server, from programs like mlcp.
After running the mlcp.sh command and refreshing the browser, we see that 1,155 songs have been loaded into our application, with basic URI links appearing in the results pane. It’s important to note that mlcp supports ingestion of billions of documents using its distributed loading features.
A quick scan of headers in any of the data files allows a developer or business analyst to select objects on which to facet. In this example, using the admin interface for our mysongs database, we set up indexes for the week, genre, title, and artist facets. Clicking the “ok” button allows us to return to the MarkLogic application builder to complete our work.
In the “Assemble” screen, we select a pie chart widget to view artists and a horizontal bar graph to display genres.
In the “Results” screen, we accept elements picked up by Application Builder to configure the “Title,” “Snippet,” and “Metadata” sections.
Finally, in the “Appearance” screen, we select a text logo and change the “Skin” to “Dawn,” keeping things simple, although extensive customization is relatively easy.
When the “Deploy” button is clicked, our faceted, search and discovery application is displayed.
If a user clicks on “Mariah Carey” in the pie chart widget, note how the facets, widgets and result set are updated in the next screenshot.
Also, note in the next screenshot, how facet values such as artist: "The Beatles" and week: 1964-02-08 can be typed into the search box. What’s not shown is how the facets and their respective values are displayed as search suggestions as the user types.
The My Favorite Songs application took less then 30 minutes to build, which included iterations to correct typos in index creation and time to play with widgets.