Blog(RSS)

MarkLogic World is Next Week! Check out the Lightning Talk Room

by Eric Bloch

BaconWe've been busy organizing festivities for MarkLogic World next week in Las Vegas.  There's all sorts of lightning talk goodness headed your way as you can see below. And, we have some tasty treats planned as well (can you smell them?)

You can find the full schedule and stay up to date with our mobile (web) app. You can bookmark it at mlw13.marklogic.com (built on MarkLogic, with code at GitHub).

Tuesday
  • Client Connectors via the REST API - Tuesday 1:30 
    • Chris Cieslinski - Experiences with Node.JS and the MarkLogic REST API 
    • Adam Fowler - A Node.JS connector for MarkLogic
    • Mike Wooldridge -  Easy MarkLogic Applications with PHP
  • Showing off your data  - Tuesday 2:30 
    • David Lee - Advanced JSON Transformations
    • David Lee - Getting data from MarkLogic into Excel
    • Scott Brooks - Using HighCharts and MarkLogic
  • Semantics - Tues 3:50PM 
    • Micah Dubinko - Building Semantic Applications 
    • John Snelson - Introduction to SPARQL in MarkLogic
    • John Snelson - Inside the RDF Triple Index
Wednesday
  • XQuery Hour - Wed 10:30 
    • Jason Hunter - XQuery Gotchas 
    • John Snelson - Modelling Application Data in XQuery 

  • Search with MarkLogic - Wed 1PM
    • Andrew  Wanczowski - An Introduction to Search Visualization Using NYC Open Data
    • Beverly Jamison - Correct and Consumable Answers to Complex Questions 
    • William Thompson - Search Intelligence and MarkLogic Search API 
  • Operational Infrastructure - Wed 2PM 
    • Aaron Rosenbaum - Storage for MarkLogic 101
    • Aaron Rosenbaum - Running MarkLogic on VMWare
    • Haitao Wu - Rebalancing Clusters
  • Operational Tools - Wed 3:10PM 
    • Clark Richey - Point in time recovery
    • Norm Walsh - Config Management Update
    • Geert Josten - Automate Your Deployments
Thursday
  • Practical Advice from MarkLogic Developers - Thurs 8:30AM 

    • Damon Feldman - MarkLogic in Message-oriented and Service-oriented Architectures
    • David Erickson  - Rapid Prototyping Patterns with MarkLogic Server
    • James Clippinger - Data Provenance in MarkLogic

  • Inside MarkLogic: Search - Thurs 9:30AM 
    • Mary Holstege - Field and Path indexing in MarkLogic: your path to Big Data Search
    • Fei Xue - Understanding and Optimizing Search Queries in MarkLogic
    • Mary Holstege - Customizing Tokenization 

My Winter Break at MarkLogic

by Dylan Daniels

MarkMail is a widely used app that allows technology professionals to easily find content across a huge variety of mailing lists. Its backend runs on MarkLogic Server, and contains a searchable collection of over 60 million archived email messages from public mailing lists around the world. MarkLogic comes with lots of cool geolocation features. Using MarkLogic, one can easily write up a query which searches over a series of geographic boundaries, including circles, rectangles, and even arbitrary polygons.

I'm currently a junior at Brown University where I am majoring in Economics and Computer Science. Over my winter break, I took on an internship at MarkLogic. My main task during the six weeks, was to expand MarkMail to include geolocation features and to design a prototype for a new homepage to expose the geographic data ingested into MarkMail's servers in real-time.

So how did I accomplish my task?

We first needed to extract geographic information from emails. The received headers of an email trace the route of an email message as it is sent from one server to another. One can read through these headers to follow the path from an email's origin at the sender's client to our MarkMail SMTP servers. Each part of the received headers contains IP addresses and DNS hostnames that identify the servers the message passed through. Using MaxMind's Geo IP database, we were able to map IP addresses and hostnames to geographic locations. However, not all IPs can be mapped to locations since they are private addresses. In this case, we simply set the geographic location to the next server that had a public IP address. Finally, to accomplish this task over a huge dataset we used Hadoop's MapReduce with the MarkLogic Connector for Hadoop to run a batch processing job over the existing emails in the database to enrich them.

Now that we had geographic information tied to each message in the database, we needed to extend the ways one can search on MarkMail to enable geo-queries. Searching geographic information is most natural when the user can interact with the results on a map. Using the Google Maps Javascript API, we were able to show the location of email messages on the map and enable the user to draw circles and polygons to search over specific regions.

Some of the toughest challenges came with this part of the project. Trying to display millions of emails on a map was obviously impossible. A heat map seemed like a good solution to our problem, but this would prevent users from being able to click on individual messages. We also considered a hybrid solution, where a heat map would turn into specific messages once map reached a certain zoom level, but we thought this might be too jarring a transition for the user.

We decided to take subset of the messages and show them. A tight cluster would create a heat map type effect implicitly. At first we tried to take a random sampling of a couple hundred emails to display on the map. But this led to more problems. It turns out, certain locations send a lot more emails than others (i.e. Google's servers), and random selection thus gives an unfair weight to these clusters of locations. Towards the end of my break, I realized there was an optimal solution – a strategy that involves drawing a weighted random sampling produces a decent distribution of messages over the map.

Yet the project was not complete without working on a prototype of an improved homepage to show off the new geo capabilities of MarkMail. I designed a homepage which exhibited on a map the most recent emails that came into our server, and animated the transitions. MarkMail's new homepage will send the user flying around the globe in real-time.

I learned a ton during my internship at MarkLogic. I taught myself XQuery, utilized the powers of MarkLogic Server, improved my skills in JavaScript and CSS, and learned the importance of scalability. I can't wait to see the changes that will be pushed to production in MarkMail over the coming months. It's going to be exciting stuff.

MarkLogic 2013 - Call for Participation!

by Eric Bloch

MarkLogic World 2013

Preparations for MarkLogic World 2013 are underway and we have a call for speakers (see the Agenda tab). As part of the conference, we are organizing a series of lightning talks covering hands-on topics. These talks are an ideal venue for sharing and getting feedback on your latest work as well as a great place to hear from fellow developers actively engaged with MarkLogic technology. If you're interested, details are available at the signup.

As we've done in the past, we're also again creating some poster displays for MarkLogic developer community projects. This is a collaborative effort that helps us learn more about each others work. Here, you provide the content and we'll make an awesome poster. If you have a project you'd like to share at the conference and have limited cycles yourself (like who doesn't!), just drop me a line directly (eric.bloch AT marklogic dot com) and we can chat about making a poster.

For those of us who like to wax nostalgic, below are a couple posters from last year's conference:


xray


Carrot.

blogroll Blogroll