Using Fields for Data Integration

by Dave Cassel

In many cases, your application depends on data coming from different sources. Today I'm going to show you one trick you can use to integrate different data sources without having to transform them.

Let's a take a look at some data.

Here's a recipe document. Suppose that in my application, I want to build a facet on ingredients. In this document, ingredient names are stored in "item" elements, so I can build a range element index on item. That's a pretty generic name though, so I'm probably better off using a path: ing/item. Either way, I can use an index and build a facet.

Let's take a look at another recipe document. In this one, the ingredient names are stored in another generic element name: "name". Again, I can be more specific using a path: "ingredient/name".

On to a third document. This one is JSON, just to show that it works the same.

My goal is to build a facet that draws from all three of these. The way to do it is to create a field. A field lets you specify multiple elements, JSON properties, or paths, and refer to them with a single name. In our case, we want three paths:

  • ing/item (XML)
  • ingredient/name (XML)
  • ingredients/name (JSON)

I'll use the Management API to set up the field. I'll also create a field range index on that field. I've prepared a config.json file that holds the configuration of my database. This is a good practice in general to ensure your configuration is repeatable across environments. Here's the new configuration for my field and for my field-range-index. (For a full application, my configuration would have a lot more information, but this will do for now.)

Notice the "field-name" property. This is the name by which I can refer to the collection of values that appear in any of the three paths configured as part of the field. I've also turned off a couple of the "fast-*-searches" options, since I won't need them for my facet.

How did I know what to put in here? The documentation has great information, but you can also cheat: use the Admin UI to create what you need, then look at http://localhost:8000/manage/v2/databases/Documents/properties, using the JSON format to see what you need (see the links at the bottom of that page). That's where I got the contents of this file. Recording your properties in a file lets you check your options into Git (or SVN, etc) and ensure that all your environments use the same configuration.

Time to apply that configuration:

With this command done, I now have my field and a field range index. My next step is to create a facet. In the XML below, I define a set of REST API search options. (I used XML, but could have used JSON instead.)

Now to deploy those search options:

All set. When I call hit the search endpoint, specifying the new options, I get a facet that draws values from all three sets of data: http://localhost:8000/v1/search?options=field-options. I can use these with direct calls to the REST API, or through the Java or Node Client APIs.

Note that what we've done is to build a facet that represents content from three different data sources without having to transform them. This is pretty handy, but because we didn't transform them, you'll still need to account for the different structures elsewhere in your application. If you're okay with that, this method can improve your searches pretty easily.

Comments

  • I have created a field configuration on top-songs sample database and here is the configuration: field name: writer-producer field path Path name: writers/writer weight : 1.0 Path name: producers/producer weight : 1.0 I am running the query as below: cts:search(fn:doc(), cts:field-value-query("writer-producer","Glenn")) This search is not yielding any result. What could have gone wrong?