MarkLogic 9 introduced double precision geospatial indexes. Here are answers to some common questions.
Geospatial search queries are resolved through the geospatial index. Even though MarkLogic stores geospatial data as is, to save space and improve performance, geospatial indexes are stored with a finite precision. This loss of precision could result in false-positive results.
Before MarkLogic 9, geospatial indexes could only be stored with float precision (23 bits of significance, 8 bits of exponent, and 1 sign bit = total of 32 bits). MarkLogic 9 supports the option to create indexes with double precision (52 bits of significance, 11 bits of exponent, and 1 sign bit = total of 64 bits).
To take advantage of double precision indexes, two factors must be true:
Here are some use cases for double-precision:
Take care when talking about precision as it is dependent on the coordinate system and location on the earth.
Using the Raw coordinate system, which assumes a perfect Euclidian plane, float precision is accurate to micrometers (1×106 meters). Double precision is accurate to femtometers (1×1015 meters).
When considering geodetic coordinates (WGS84, ETRS89), things are much more complex. The earth is not a perfect sphere, but is ellipsoidal bulging at the equator and the calculations required are significantly more complex. At the equator, float precision can provide an accuracy of approximately 2 meters. Double precision is accurate to nanometers.
In common language, the terms precision and accuracy are often used interchangeably. It is important to understand the difference and to understand it is possible to be precise but not accurate and vice versa.
Precision is a measure of exactness. Think of precision as a measure of how small a pin you can put on a map. Accuracy refers to the closeness of a measurement to reality – does that pin actually fall over the target. Double precision indexes allow for unprecedented exactness, but accuracy is solely a function of the data. When accuracy is in doubt, use the Tolerance functions to account for the noise.
When there are errors and “noise” in the source data, features that are equivalently equal may be represented as different. This issue will be compounded with double precision indexes.
Regardless of index type, when the accuracy of the data is in doubt, use the tolerance functionality to remove the noise as appropriate. It will be necessary in all but the most extreme cases to use tolerance when using double precision indexes.
Performance testing has found a 3% degradation in ingest and 15% degradation in query performance with double precision indexes. Due to the increased size of the indexes and the additional complexity of 64-bit operations, this should be expected.
As described above, double precision indexes are meant for specific use cases where the application’s precision requirements and the quality and precision of the data warrant their use.