Geospatial Double Precision FAQ

Anthony Roach
Last updated August 24, 2017

MarkLogic 9 introduced double precision geospatial indexes. Here are answers to some common questions.

Why are double precision indexes important?

Geospatial search queries are resolved through the geospatial index. Even though MarkLogic stores geospatial data as is, to save space and improve performance, geospatial indexes are stored with a finite precision. This loss of precision could result in false-positive results.

Before MarkLogic 9, geospatial indexes could only be stored with float precision (23 bits of significance, 8 bits of exponent, and 1 sign bit = total of 32 bits). MarkLogic 9 supports the option to create indexes with double precision (52 bits of significance, 11 bits of exponent, and 1 sign bit = total of 64 bits).

Should I use double precision?

To take advantage of double precision indexes, two factors must be true:

  1. Your geospatial data has more than 32 bits of precision. If the high degree of precision does not exist in the source data, the new index will be of no benefit.
  2. The application has the need perform geospatial operations with greater than 32-bit accuracy.

Here are some use cases for double-precision:

  • Hospitals track equipment moving around health facilities.
  • Public safety and law enforcement operations require knowledge of room-to-room movements in a building.
  • Geological survey agencies need to monitor slow-moving objects that move in sub-meter increments, like fault lines and tectonic plate movements.

How precise are these indexes in real-world terms?

Take care when talking about precision as it is dependent on the coordinate system and location on the earth.

Using the Raw coordinate system, which assumes a perfect Euclidian plane, float precision is accurate to micrometers (1×106 meters). Double precision is accurate to femtometers (1x1015 meters).

When considering geodetic coordinates (WGS84, ETRS89), things are much more complex. The earth is not a perfect sphere, but is ellipsoidal bulging at the equator and the calculations required are significantly more complex. At the equator, float precision can provide an accuracy of approximately 2 meters. Double precision is accurate to nanometers.

What is the difference between precision and accuracy?

In common language, the terms precision and accuracy are often used interchangeably. It is important to understand the difference and to understand it is possible to be precise but not accurate and vice versa.

Precision is a measure of exactness. Think of precision as a measure of how small a pin you can put on a map. Accuracy refers to the closeness of a measurement to reality – does that pin actually fall over the target. Double precision indexes allow for unprecedented exactness, but accuracy is solely a function of the data. When accuracy is in doubt, use the Tolerance functions to account for the noise.

When do I use Tolerance?

When there are errors and "noise" in the source data, features that are equivalently equal may be represented as different. This issue will be compounded with double precision indexes.

Regardless of index type, when the accuracy of the data is in doubt, use the tolerance functionality to remove the noise as appropriate. It will be necessary in all but the most extreme cases to use tolerance when using double precision indexes.

How much bigger will my double precision indexes be on-disk?

To be expected, double precision indexes are twice as big as float precision. Double precision indexes take up twice the space in memory and on disk.

What is the performance impact of using double-precision indexes?

Performance testing has found a 3% degradation in ingest and 15% degradation in query performance with double precision indexes. Due to the increased size of the indexes and the additional complexity of 64-bit operations, this should be expected.

As described above, double precision indexes are meant for specific use cases where the application’s precision requirements and the quality and precision of the data warrant their use.