UML Modeling with MarkLogic's Entity Services

by Mike Havey

In my previous blog I explained why upfront high-level modeling is essential. I recommend doing so with Unified Modeling Language (UML), as it helps to visually depict your model for greater clarity. UML can feed into MarkLogic’s Entity Services, which is a shockingly low-effort means to model-driven data management in MarkLogic. When I first played with it, I was surprised how little input I had to provide to reap a treasure chest of outputs.

My toolkit provides the ability to transform a UML data model to a MarkLogic Entity Services model. To use it, you’ll need MarkLogic 9-0.3 or later plus your preferred third-party UML modeling tool. The UML tool you select must support UML 2.x, must be able to export UML models to XMI 2.1, must be able to import UML profiles, and must support stereotypes and tagged values. To test the workflow, I used two such tools: MagicDraw 18.5 and Eclipse Modeling Framework 2.x.  The toolkit  includes several UML examples that demonstrate the model-driven workflow process.

Let’s walk through this process.

To begin, open the UML tool and create a new package containing a class diagram. In the tool, import the MarkLogic Entity Services UML profile (available in the toolkit as umlProfile/profile/MLProfile.profile.uml) and apply it to your package. Build your model by adding classes to the diagram, adding attributes to classes, and relating classes using UML’s relationship types: generalization, association, aggregation, and composition. Apply stereotypes defined in the profile to classes, attributes, or to the overall package to provide configuration (e.g., the package’s version, the class namespace, the attribute’s collation) to be used later in the process by Entity Services.

For this example, we will use the movie-role model (from the file IMDBMoviePhysical.xml); the next figure shows it open in MagicDraw. Before delving into this model’s particulars, let’s observe that at a high level the model describes two main types of data, movies and contributors.

Contributors are of two types: persons (actors, directors, writers, etc) and companies (production companies, special effects companies, etc). There is a many-to-many relationship between movie and contributor, and we express that relationship as role. A contributor performs a role (or perhaps several roles) in a movie; the set of roles for a contributor is that contributor’s filmography. A movie’s cast is the set of roles -- director roles, actor roles, writer roles, production company roles, and others -- in that movie. A movie also has a set of parental certificates, the parental ratings per country for the movie. A movie and a person contributor can have user documents. These are user-contributed posts, such as actor biographies and movie plot summaries.

The model has three levels of structure. At the highest level is package, which maps to an Entity Services model. We name it MovieModelPhysical and tag it with two properties that are needed by Entity Services: baseUri and version. These tags belong to the mlModel stereotype from the custom profile.

At the next level is classes. Our model has seven classes -- Movie, MovieContributor, PersonContributor, CompanyContributor, Role, UserDocument, ParentalCertifcate. These map to Entity Services entities.

Each class contains one or more attributes, which map to Entity Services properties.  An attribute has a name, a type, multiplicity, and can be stereotyped with Entity Services configuration. Here are a few examples from the class Movie:

  • movieId is a String of multiplicity [1], indicating that it is a required attribute, with exactly one value expected. We stereotype it as PK to indicate it is the primary key of the class.
  • seriesId is a String of multiplicity [0,1], indicating that it is an optional attribute.
  • countries is a String of multiplicity [0..*], indicating that it is an array of Strings.
  • imdbUserRating is a Real of multiplicity [1], indicating that it is a required floating point value. We stereotype it as rangeIndex; Entity Services will generate an element range index for it, enabling us to run range queries against it.

Especially interesting is this model are the class relationships:

  • Movie is associated with Role. Additionally, MovieContributor is associated with Role.  By this we mean that there is a structural relationship between Movie and MovieContributor which is defined by Role. Role defines the many-to-many relationship. From the contributor’s perspective, the association is filmography; from the movie’s perspective, the association is cast.
  • MovieContributor is a generalization of PersonContributor and CompanyContributor. Put differently, PersonContributor and CompanyContributor inherit the attributes of MovieContributor. Interestingly, this means each inherits the association with Role; a person has roles, as does a company.
  • PersonContributor and Movie aggregate UserDocument. This means that a user document is part of a person contributor’s record and part of a movie’s record. Although we consider this relationship part/whole, we regard it as aggregation rather than full-fledged composition. Thinking ahead, we expect to maintain UserDocument as its own document in MarkLogic, no more closely related to the movie or contributor than to its author.
  • Movie composes ParentalCertificate. This means that a parental certificate is part of the movie record and could not exist without the movie. Thinking ahead, we foresee ParentalCertificate residing in the MarkLogic database not as its own document but as a subdocument of Movie.

Association, aggregation, and composition relationships are shown in the diagram as lines between class boxes, but under the covers they are just attributes of classes. For example, Movie has an attribute named parentalCerts of type ParentalCertificate; this attribute is an array, indicated by its multiplicity of [0..*]. The Movie entity has another array reference attribute called cast, which is a reference to Role. In Entity Services, these attributes are mapped to properties whose type is reference. The Entity Services model does not distinguish association from composition. As we discuss in the next section, to ensure instance data conforms to the intent of our UML model, we must carefully modify the conversion module and TDE template generated from the Entity Services model.

Transforming XMI to Entity Services

From the UML tool export the class diagram to an XMI file. It is now time to transform the XMI to an Entity Services model descriptor. The toolkit provides a gradle-based utility to do this. The basic steps are the following:

  • Add your XMI to the gradle/data/xmi folder.
  • Transform the XMI to Entity Services using the gradle buildESModelDescriptors task. Review the Entity Services model descriptor this creates and check whether it is valid.
  • Deploy the model descriptor using the gradle mlgen task. This task generates several artifacts, notably a database index configuration file, an XQuery conversion script, and a TDE template. Examine these artifacts and modify them if necessary. (We’ll discuss modifications more below.)
  • Deploy the artifacts using the gradle mlDeploy task.

(The README file in the toolkit explains these steps in detail.)

Let’s review the mapping for our movie model. The following code listing is an excerpt of the model descriptor produced by the transformation. (If you compare it to the UML diagram in the previous section, you see how the mapping worked. Refer to the next section for a general reference guide to the mapping.)

Two important artifacts that the Entity Services library generates are the converter module and the TDE template. It is expected that the developer will modify this generated code. We modify the movie conversion module as follows:

  1. We embed ParentalCertificate documents as sub-documents of Movie.
  2. In UserDocument, we link to PersonContributor and Movie by populating fields contribId and movieId, respectively.
  3. In Role, we link to MovieContributor and Movie by populating fields contribId and movieId, respectively.
  4. Since PersonContributor and CompanyContributor inherit from MovieContributor, we factor out to a common function the conversion of the inherited attributes.
  5. We leave empty Movie’s reference to Role.
  6. We leave empty Movie’s reference to UserDocument.
  7. We leave empty MovieContributor’s reference to Role.
  8. We leave empty PersonContributor’s reference to UserDocument.

These changes are made to honor the class relationships we designed into our UML model. Change 1 implements the composition relationship between Movie and ParentalCertificate. Change 4 implements the generalization relationships among MovieContributor, PersonContributor, and CompanyContributor. Changes 3, 5, and 7 implement the association relationship among Role, Movie, and MovieContributor. Changes  2, 6, and 8 implement the aggregations among UserDocument, Movie, and PersonContributor.

Our modifications to the TDE template are to adjust the relational lens in accordance with the relationships. The most significant changes concern UserDocument and Role.

  • UserDocument has its own view. We find user documents using this view. Neither Movie nor PersonContributor contains references to user document. Rather, from UserDocument, we join on movieId or contribId to see the associated Movie or PersonContributor.
  • Role has its own view.  Neither Movie nor MovieContributor contains references to roles. Rather, from Role, we join on movieId or contribId to see the associated Movie or MovieContributor.

If you’re interested in reviewing this code, it’s in the toolkit. The conversion module is gradle/src/main/ml-modules/ext/entity-services/MovieModelPhysical-0.0.1.xqy. The TDE template is src/main/ml-schemas/MovieModelPhysical-0.0.1.tdex.

With these changes in place, we proceed to ingest data. The gradle toolkit provides sample movie data. It shows how to use the gradle MarkLogic Content Pump (MLCP) plugin to ingest data from JSON files to MarkLogic. The MLCP job is configured to use the Entity Services converter script to transform the documents to the canonical form defined by the model; it also packages them in envelopes.

We conclude by running a few queries to verify that the ingested data meets the design goals of our UML model. Since we set up a TDE template, we run SQL queries.

The following shows movie cast:

The next figure shows a person contributor's filmography:

The next figure shows movie user documents:

The last figure shows parental certificates:

Further Learning:

Appendix: XMI-to-Entity Services

Entity Services Component

Entity Services Setting

UML Mapping

Model

Title

Package name

Version

On package, use mlModel stereotype. Set version tag. 

Base URI

On package, use mlModel stereotype. Set baseUri tag.

Description

Package documentation comment

Entity

Name

Class name

Description

Class documentation comment.

Namespace URI and prefix

Namespace stereotype on the class. Or define common namespace at package level.

Primary key property

One attribute is the class is given PK stereotype.

Required properties array.

Each required attribute in the class is given a multiplicity of 1

Word lexicon list,

Element range index list,

Path range index list

Attributes needing element range index are given rangeIndex stereotype with indexType tag set to “lexicon”, “element, or “path.

Property - primitive, array of primitives, external reference, or array of external references.

Name

Attribute name

Data type

If the datatype is a normal primitive like string, int, or float,  set the attribute datatype to a UML primitive type.

 

If the datatype is ML-specific (e.g. IRI, ref), apply to the attribute the  stereotype mlProperty and set the tag mlType tag set to a valid Entity Services property type.

 

If the datatype is an array, set the datatype as above and set the attribute’s multiplicity as 0..*.

Description

Attribute documentation comment.

Collation

If the data type is a string or string array, you may optionally specify collation. To do this, set the collation tag of the mlProperty stereotype.

External ref IRI

If the property is an external reference or an array of external references, on the attribute apply the mlProperty stereotype and set the externalRef tag.

Property - internal reference or array of internal references

Name

Model as an association, aggregation or composition. The name is the name of the attribute on this side of that relationship.

Data Type

Class name on the other side of the relationship. Multiplicity is treated the same as non-referential attributes.

Description

Attribute documentation comment

Comments