Element Level Security Performance

by Silvano Ravotto

MarkLogic 9 provides the ability to select certain paths within documents and protect them from viewing or updating, unless the user has the proper credentials. In this post we will cover the performance implication of using this functionality and provide some recommendations to design and implement a solution that uses element level security.

Let’s start with a simple example that will help during our discussion. We have a system that stores employee information. The information consists of:

  • Employee name
  • SSN
  • Salary
  • Department
  • Telephone Numbers

For the sake of simplicity, let’s assume that this information is in one XML document for each employee.

We have also some rules that define the access to this data:

  • Anybody with access to the system can see name, organization, and telephone number.
  • HR can see and update all the information.
  • Managers in each department can see and update salary information. In our example, we will be using only 2 departments: Engineering and Marketing.

Security Design

Based on the rules above, we identify the following protected paths:

  • /employee/ssn
  • /employee[dept=".."]/salary (for each possible department)

We define the following roles:

  • public = role granted to anybody that has access to the system
  • eng_manager = role granted to managers in the engineering department (includes the public role)
  • marketing_manager = role granted to managers in the marketing department (includes the public role)
  • hr = role granted to HR personnel (it includes both manager roles above + public role)

Document level permission

  • (public, "read"). Give anybody with a public role access to the documents. Note that access to single elements will be granted based on the permissions defined below.
  • (hr, "update") . Users with hr role can update any elements in the document.

Element level permission

Below the level of document level permissions, we set these protected paths:

  • /employee/ssn → (hr, "read"). HR role can see the Social Security Number.
  • /employee[dept="Engineering"]/salary → (eng_manager, "read"), (eng_manager, "update"). Engineering managers can see and update the salary for employees in their department.
  • /employee[dept="Marketing"]/salary → (marketing_manager, "read"), (marketing_manager, "update"). Marketing managers can see and update the salary for employees in their department.

Tip: Protect multiple paths under a common parent/ancestor to reduce the number of paths. For instance, if there are additional information that can be seen only by managers, you can create a comp element that has salary, bonus and options information:

Query Rolesets

As we will discuss below, query rolesets provide "expansion" rules that allow a user with specific roles to query terms that are inside protected elements. In our example each of the paths only has one role associated to it, so the rolesets we need to define are straightforward:

  • ((hr))
  • ((eng_manager))
  • ((marketing_manager))

The code below creates all the security objects discussed so far. The code is written in XQuery, however it is possible to use JavaScript or REST API to accomplish the same.

Ingestion

Element level security impacts how ingestion works. During ingestion the system has to check if any of the nodes in the document are protected. If a node is protected, it will create a special entry in the universal index for each term within the protected paths. The special entry encodes all the paths that are protecting the term and the roles that can access the path.

Using the sample document and the security set up above the word "123" (part of the SSN) will be indexed combining the hash of "123" and the roleset hash (which is the list of all the roles in the protected path, in this case hr).

Similarly, the word "100000" in the salary element will be indexed combining the hash of "100000" and the hash of all the roles in the protected path (in this case eng_manager).

For special indexes (like range indexes), the system partitions the indexes based on the roleset associated to the path.

A new built-in (xdmp:node-query-rolesets) is provided to return the sequence of query-rolesets that are required for querying a document with Element Level Security. A typical workflow calls this function and adds each query-rolesets through the sec:add-query-rolesets function before inserting the document into the database so that the document can be correctly queried with Element Level Security as soon as it is inserted.

Calling the function for each document during insertion has a performance impact (in our tests we saw a degradation of up to 100%), and adding new query rolesets in the Security database causes invalidation of security caches. This may briefly impact the overall performance of the system. This is a concern for environments where the content is highly dynamic and the query rolesets cannot be determined a priori (for instance if the theoretical number of query rolesets is large, but only a small subset in used in practice and the subset may change overtime depending on business requirements).

For details, see xdmp:node-query-rolesets.

Performance Considerations

One advantage of the approach above is that there isn’t an impact on the size of the universal index or other indexes, since each term protected by one or more paths is added to the indexes only once (with additional role encoding). Therefore, the performance impact is restricted to the CPU time spent in determining the list of protected paths that apply to each element of the document. We have implemented a heuristic to improve the performance. For instance if a protected path contains a leaf element (e.g. "ssn" or "salary" in our examples), the test is fast, since we keep a map of (leaf => protected paths) for quick lookup. In some cases, the heuristic doesn’t help in determining if an element matches a protected path (such as if the path ends with "*"); in that case we need to validate each node against the path, and that could be expensive. As a rule-of-thumb ingestion performance will depend on how many protected paths match element names in a document, and the frequency of such element names.

For instance, in our example, the "salary" element occurs only once in a document and it is protected by few paths (one for each department), and the ssn element occurs once and it is protected by only one path. The performance degradation will be negligible (less than 5%). In our tests, protecting an element that occurs multiple times in a document with 10 protected paths has roughly a 10% penalty in ingestion. With 100 protected paths on the same element, the degradation is 100-120%.

Tip: Avoid paths ending with a wildcard element.

Search

Search is also impacted by Element Level Security. To access terms protected by paths, we need to provide, at search time, the list of roles that protect the term. This is done using query rolesets. In the example above, we have simple query rolesets, consisting of only one role.

Given that the hr role inherits from the all the managers roles, a user with the hr role will be able to search terms in the SSN and in the salary. To verify this, we can use xdmp:plan to show all the rolesets that are taken into consideration to resolve the search. For instance:

This will produce the following output:

In addition, documents that match the query are subject to concealment when they are retrieved using fn:doc(), cts:search() or similar APIs. All the elements in a document are analyzed against protected paths, and if the user does not have permission to read an element, the element will be concealed before it is returned (note: the admin role provides access to all elements, even if the admin role does not have explicit access to a protected path).

Performance Considerations

Performance is affected by the number of query rolesets that are associated with a specific user (based on the roles). For instance, in our example an engineering manager will only have one query roleset. However somebody in HR, due to role inheritance, has multiple rolesets (one for hr and one for each of the department roles). A query is expanded by adding terms for each query roleset. Each term requires an additional lookup in the universal index (or in a specialized index, if the term is a range index, for instance). This may incur in read operations from disk if the term is not cached. The read operations are done in parallel for each stand in database, even if the term does not exist in the stand.

The degradation on the search side is a bit more difficult to characterize in general terms, since it depends on what kind of searches are executed. In our testing, search degradation is roughly linear based on the number of rolesets that a user has: for instance we saw 70-80% with 100 query rolesets and up to 800% with 1000 query rolesets. However, in this particular use case we performed simple word queries and element range queries. More complex queries (such as near queries, wild card searches with lexicon expansion or geospatial region queries) may result in even more degradation. We suggest that you run a performance test in your environment to measure more precisely the impact of the query rolesets.

To retrieve the content of a document, we use a similar approach to indexing: each element is evaluated against protected paths, and only elements that are not protected or that the user can see are returned.

Re-indexing

Re-indexing is required when paths are protected or unprotected. Note that before removing a protected path, we need to unprotect the path to allow the re-indexer to process all the affected fragments.

The scope of re-indexing is determined by applying heuristics that estimate the fragments that are affected by protecting (or unprotecting) a path. Since, in general, a protected path may not be fully resolved against the indices, the re-indexer may process fragments that do not contain the path (this is similar to a unfiltered vs filtered search). The heuristics applied are subject to change, since we are always improving the performance of the re-indexer. Once the re-indexer starts, it is possible to determine the number of fragments affected by looking at the admin UI, REST API or the xdmp:forest-counts().

Further Reading

Comments