Imagine having a BI dashboard that is not only interactive with support for full text and complex searches on unstructured data, but also intelligent and customized with data from the semantic web. A hybrid, if you like, of BI, search and semantics on both structured and unstructured data. The combination of MarkLogic and Tableau provides this today- and we have an example for you here on Tableau Public.
The Data Set
A little background. MarkLogic provides a free public service to the technical community called MarkMail.org, a collection of almost 9,000 mailing list archives from over a 12-year span. Currently we have over 65 million messages that you can search, including email attachments. These listserve posts, or emails, which have both structured and unstructured components (the email body) are a perfect use case for showing how we can integrate with a BI tool such as Tableau.
We used structured components such as the “to” and “from” fields as values for some of our BI “dimensions” (categories to group data by – similar to range indexes and facets in MarkLogic). In addition, we used entity extraction on the message body to pull out more terms that we might also want to use as dimensions. While MarkLogic is an enterprise NoSQL database, we can create a “view” in MarkLogic that looks like a table to a BI tool like Tableau. Through our ODBC driver, we can convert Tableau’s SQL queries to queries MarkLogic can understand. And, with MarkLogic 7, we can go even further. Tableau allows for users to type in custom SQL. MarkLogic has enhanced the SQL MATCH operator to support our complex enterprise search features such as Boolean operators, word proximity and fielded search. This is in addition to the support for full text search on the entire documents with stemming, tokenization, and all that good stuff (NOT just a grep on a relational database column)! To demonstrate our use case, we created a sample Tableau dashboard using a subset of MarkMail data.
(A note about MarkLogic’s ODBC connection. In order to provide this demo to Tableau Public, we have extracted the data from MarkLogic into a TDE file -- but you could easily run live data from MarkLogic into Tableau for real-time analysis on your own server.)
If you want to know more about the details on how all of this came together you can refer to the “Analytics, NoSQL, and Visualization” webinar that MarkLogic and Tableau presented through Data Science Central.
What Exactly Am I Looking At?
What you see on the Tableau Public site is a Tableau dashboard loaded with MarkMail. We’ve supplied some charts and graphs for you to play with, and we’ve built a dashboard with some of them. We’ve included a search bar using a Tableau parameter that uses a new Tableau 8 feature that supports parameters in custom SQL. That parameter is just the right hand side of a SQL MATCH query, and it can be anything you’d expect from a powerful search engine like MarkLogic: Booleans, proximity search, fielded search and so on. The entire email is searched, not just a column like you’d get with a relational database! Then, because Tableau Server and MarkLogic are http servers, you can easily embed one application or widget into another. You could have a Tableau inside a MarkLogic application, or, like in the Tableau Public case, a MarkLogic application inside Tableau. That way, you can drill right into your textual results. Try it! Click on an email snippet and a new window pops up showing the entire document (in this case an email).
But Wait, There’s More: Semantics Enriched Dashboards with SPARQL and RDF Triples
We’ve really just scratched the surface of what can be done, but if you type in “Hadoop” with a capital H (as SPARQL is case sensitive), you will see the right hand side of the dashboard fill with new data. This isn’t just any data. It’s not in a database per se, it’s coming from the Web. We’re using your search term to also search MarkLogic’s 7 triple store indexes to see “what else do we know?” -- like an infobox you see sometimes with Google searches. The Hadoop logo isn’t stored in our database, but facts about Hadoop (including a link to the image) are stored using our new RDF triple store. When you search, Tableau is converting your search to SQL and sending it to MarkLogic, and at the same time we are also sending the search term to MarkLogic through REST and converting the search to SPARQL. You could have an infobox on your dashboards, or customized pages for every user, and have results change in real-time based upon data that changed because someone on the other side of the world updated a wiki page. This is business intelligence with the power of the semantic web. Enhanced by billions or trillions of facts. Think about your dashboards in which the data shown has both data from your organization but is also enhanced by interconnected pages and facts from millions of other people – and how this might help your organization.