Exploring Eleven Years of Boing Boing

by Justin Makeig

BoingBoing

 

As part of their bloggaversary the creators behind the seminal blog Boing Boing have released 11 years worth of content for remixing and tinkering (non-commercially, of course). I put together a quick search application using Information Studio to load and process the content and Application Builder to generate the actual app. You can see the results at boingboing.demo.marklogic.com.

Loading the Data

The XML comes zipped as a single file containing many row elements, most likely exported from a relational database. I created a custom collector for Information Studio to split the aggregate document into individual row documents, one for each entry. I’ve configured my flow in Information Studio to process each row document with two transformations. The first uses some XSLT to prune elements with NULL values (another artifact of the relational world), format the timestamps in the created_on element into proper xs:dateTime, and finally to parse the HTML in each entry’s body as proper XHTML. I’ve also added an XQuery transformation step that takes the result of the XSLT and parses out the category metadata from a comma delimited list into more usable XML. To learn more about how to use or customize Information Studio to load content, please consult the documentation.

Information Studio flow

Building the Application

The application itself was generated with Application Builder. Like Information Studio, Application Builder comes as part of your MarkLogic installation. It allows you to build search applications without having to write any code. It’s great for prototyping or as the foundation of a “real” application.

In the Application Builder UI I’ve configured faceted navigation and search constraints for authors, categories, and dates. Additionally, I’ve bucketed my date facet by year. I was able to do all of this without writing any code.

Application Builder is designed for extensibility. I built the histogram in the results page with 40 lines of custom XQuery and 25 lines of CSS. 

Application Builder histogram customization

I’ve made all of the code available under the Apache 2.0 license on Github. Feel free to explore and reuse.

Comments