If you are new to working with geospatial data using Optic, check out the Query Geospatial Data using Optic tutorial for an introduction to using Optic Geo. This tutorial dives into some of the more advanced topics as a follow-up.
In Optic Geo, an indexed region is stored with its coordinate system. Thus, it is important to know which coordinate system your application should use and how Optic Geo determines the governing coordinate system for insertion and query.
MarkLogic supports wgs84,
wgs84/double, wgs84/radians, wgs84/radians/double, etrs89, etrs89/double, raw, and raw/double
coordinate systems. The default and most commonly used is wgs84. This is a geographic coordinate system, which means it takes into account the curvature of the Earth. The double
affix on the coordinate system name represents the precision at which the latitude and longitude of points will be stored. If the double
affix is not present in the governing coordinate system, float
precision is used.
Use a double
precision coordinate system if your application requires very precise points. To be able to tell which side of the street a feature is on is an example use case for a double precision coordinate system. Double-precision points use double the amount of disk space as float points, so consider this when choosing a coordinate system for your application.
As of MarkLogic 11, theradian
angular unit is supported for wgs84 and wgs84/double. Use this coordinate system if your data has latitude and longitude with angular units as radians.
etrs89
continues to be supported in MarkLogic 11, and models the Eurasian tectonic plate. See here for more information.
The raw
coordinate system represents a Cartesian coordinate system and does not reflect any curvature of the Earth. See here for more information.
As mentioned above, sem:triple and TDE triples have a bit of an unconventional way to specify a coordinate system.
You can specify a coordinate system IRI as a prefix to assign a region to a coordinate system.
<http://marklogic.com/cs/wgs84>POINT(50 50) <http://marklogic.com/cs/wgs84/radians>POINT(50 50) <http://marklogic.com/cs/wgs/radians/double>POLYGON((-67.34709780707836 -2.734423830932904,-62.33733218207835 -2.734423830932904,-62.33733218207835 -5.714247209411817,-67.34709780707836 -5.714247209411817,-67.34709780707836 -2.734423830932904)) <http://marklogic.com/cs/raw>POLYGON((-67.34709780707836 -2.734423830932904,-62.33733218207835 -2.734423830932904,-62.33733218207835 -5.714247209411817,-67.34709780707836 -5.714247209411817,-67.34709780707836 -2.734423830932904)) <http://marklogic.com/cs/raw/double>@10 80,120 <http://marklogic.com/cs/etrs89>LINESTRING(-64.35881655707836 0.6922551010266869,-56.71233218207835 -6.849945334022303) <http://marklogic.com/cs/etrs89/double>LINESTRING(-64.35881655707836 0.6922551010266869,-56.71233218207835 -6.849945334022303)
See below for how to specify a coordinate system for TDE region triples.
'use strict'; declareUpdate(); var tde = require("/MarkLogic/tde.xqy"); let node = { "template": { "description": "triple geom extraction", "context": "/Placemark", "triples": [ { "subject": { "val": "sem:iri(fn:concat('http://example.org/ApplicationSchema#',fn:replace(name,' ','')))" }, "predicate": { "val": "sem:iri('http://example.org/ApplicationSchema#hasExactGeometry')" }, "object": { "val": "cts:polygon(fn:concat('<http://www.marklogic.com/cs/raw/double>',region))", "invalidValues": "ignore" } } ] } } tde.templateInsert('townsTriples.tdej', node)
The val
of the object in the TDE above dictates that our regions be inserted in the raw/double
coordinate system. This means that all regions extracted from the TDE will be in this coordinate system, and queries must specify this in the option argument of the call to geof:sf()*
in SPARQL.
The following demonstrates how to insert a sem:triple region into a non-default coordinate system.
declareUpdate(); const sem = require("/MarkLogic/semantics.xqy"); const triple = sem.triple( { "triple" : { "subject" : "http://example.org/ApplicationSchema#SugarloafVillage", "predicate" : "http://example.org/ApplicationSchema#hasExactGeometry", "object" : { "value" : "<http://www.marklogic.com/cs/etrs89>POLYGON((-118.63779 35.829243,-118.63585 35.829242,-118.6356 35.828962,-118.63539 35.828729,-118.63494 35.828367,-118.63462 35.828122,-118.63445 35.827563,-118.63409 35.827341,-118.63341 35.827339,-118.63313 35.827396,-118.633 35.82771,-118.6327 35.827848,-118.63248 35.827782,-118.6325 35.826541,-118.63244 35.82516,-118.63621 35.825073,-118.63766 35.825015,-118.63779 35.829243))", "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral" } } } ) sem.rdfInsert(triple, null, null, "geograph")
The polygon above was inserted into the etrs89
coordinate system, and can only be discovered by queries issued against this coordinate system.
The governing coordinate system is the coordinate system being used for a given operation. MarkLogic determines the governing coordinate system for a region during insertion and query.
The Appserver’s default coordinate system is important to configure for your application.
During insert, if an indexed region has a coordinate system specified in the coordinateSystem
element of a TDE column or in the data itself via an IRI, MarkLogic chooses this as the governing coordinate system.
If there is no coordinateSystem specified in the TDE nor in the data, MarkLogic indexes the region into the Appserver’s default coordinate system.
During query, the first argument’s coordinate system is used for the relation. If it cannot be deduced, the DE-9IM relate function uses the ‘coordinate-system=’ option provided to the third argument of the function.
If this is also not present, the Appserver’s coordinate system becomes the governing coordinate system for the query.
'use strict';
const sem = require("/MarkLogic/semantics.xqy");
let query =
`
PREFIX my: <http://example.org/ApplicationSchema#>
PREFIX geoml: <http://marklogic.com/geospatial#>
PREFIX cts: <http://marklogic.com/cts#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT *
WHERE { ?s my:hasExactGeometry ?o
FILTER geof:sfDisjoint(?o,'POINT(50 50)','coordinate-system=raw/double')}
`
sem.sparql(query);
If we did not specify ‘coordinate-system=raw/double’ above and raw/double
is not our Appserver coordinate system, we would not get our triples generated by the TDE as results. Two regions must be in the same coordinate system for DE-9IM relationships to be evaluated against them.
As a best practice (if possible with your specific use case), decide on your application’s coordinate system, set it at the Appserver level, and do not change it. Then, avoid specifying a coordinate system in your TDEs or as an IRI in your region data.
This way, the Appserver’s coordinate system is used everywhere and coordinate systems can be at the back of your mind during application development.
If you are building an application that uses Geo in Optic, it is advised to increase the in-memory triple index size and in-memory geospatial region index size of your database. This is important to avoid XDMP-FRAGTOOLARGE
errors during data insertion.
MarkLogic 11 comes with a new setting at the database level: triple index geohash precision. It is similar to the geohash precision setting on a geospatial region index’s configuration. This setting dictates the geohash precision of any triple region that is inserted into the database. The set of valid values for this setting is the range of integers from 1 to 12. It is not recommended to set this higher than 6, unless you are only storing points. The default is 5.
You will want to determine the best triple geohash precision for your application before inserting any triple regions, as modifying it requires a reindex.
There are various factors that determine the best triple geohash precision for your dataset. In general, the higher your triple geohash precision is, the more performant your queries will likely be, at the cost of your geospatial region indexes taking up more disk space.
If you are only storing regions and/or NOT using any of the DE-9IM functions for search (e.g. geo:within()
), a triple index geohash precision of 1 would be best, as it will save the most disk space.
If your average region is a very large polygon (e.g. a region the size of Australia) it would be best to use a geohash precision of 2 or 3.
If your average region is a polygon the size of a small country, use a geohash precision of 4.
We’ve seen the best trade-off between disk space, memory usage, and performance for a dataset with buildings at a geohash precision of 5.
If you are not too highly constrained on disk space and require better performance for the calculation of DE-9IM relations, use a geohash precision of 6.
MarkLogic cannot optimize queries against two region variables as of version 11.0.0. Queries issued with a new geo builtin (e.g. geo:contains()
) will be optimized if one region argument is a variable (e.g. a TDE column) and one is a region literal.
The lowest-level component of the geospatial search feature is the Relate Engine. Given two regions and a DE-9IM operation, it returns true or false. True if the first region satisfies the operation against the second, and false otherwise. This is great, but it has the cost of being expensive and slow.
When resolving a query with a geospatial constraint, we want to avoid using the Relate Engine at all costs because we care about performance. We can avoid the Relate Engine by using the Geohash Index in combination with Slice matching. If you are running into a slow query, it is likely that it is not being optimized because:
To debug a slow query with a geospatial constraint, turn on the Optic Region Relate
trace event. This will output log messages that will show the status of an executing query. The most useful information is logged on the D-Nodes (the nodes hosting the forests). If the query can be optimized, the logs will show information about the number of “maybe”, “definite”, and “brute-force” matches.
There are four cases that definitely cannot use the indexes efficiently as of MarkLogic 11.0.0
cts:circle()
or cts:box()
literal in a geographic coordinate system
After avoiding the four cases above, if you are still seeing many “brute-force” matches in the output of the Optic Region Relate
trace event, try increasing the triple index geohash precision setting on your database.
The number of points in your polygons affects performance as well. The more points in your query polygon, the more likely it is to be slow. If you are passing in a polygon with a large number of points and are facing performance issues, see if you can represent the same region in fewer points.
This section relates to all coordinate systems except for raw
and raw/double
.
A very popular question to ask is: “Give me all of my stored regions in column Z that are within R kilometers of point P.” So, at first thought, we would gravitate toward calling geo:within()
with one region argument as column Z and one region argument as a cts:circle
literal C that has a radius R around point P. This is unfortunately one case we cannot optimize as seen in bullet point 4 in the section above. But! There is a way to work around this and turn our inefficient query into one that MarkLogic can resolve without too much pain.
There exists a built-in function, geo.circlePolygon(),
that satisfies our requirement. Pass a cts:circle
to this function, and we get a cts:polygon
that is a rough estimate of the circle. Polygon matching can be optimized in geographic coordinate systems, while circle matching cannot.
The first argument of geo.circlePolygon()
takes in a cts.circle
representing the circle to convert into a polygon.
The second argument is an xs:double
representing arc tolerance. The closer this value is to zero, the more precise our output polygon will be. This affects the number of points in our output polygon. Keep in mind that the more points your polygon has, the more expensive the query will likely be, so if you prioritize performance, use an arc tolerance value closer to 1.
See below for an example of an optimized radius query.
'use strict'; const op = require('/MarkLogic/optic'); const result=op.fromView('regions', 'towns') .where(op.geo.coveredBy(op.col('interiorPoint'),geo.circlePolygon(cts.circle(100,'POINT(-86.68220369547988 32.820833815855)'),0.01,('tolerance=0.001','units=miles')))) .orderBy('geoid') .result() result;
Aliceville, McMullen, and Petrey are within a 100-mile radius of this point, and we are returned them as a result. The more regions in our database, the greater the difference in performance between circle matching vs. polygon matching.
Geospatial constraints on cts:box
literals also cannot be optimized. But, we have a new built-in function available in MarkLogic 11 similar to geo.circlePolygon(), geo:boxPolygon().
Pass a box to this function and it returns a polygon that is a rough estimate of the cts:box
. Polygon matching can be optimized in geographic coordinate systems, while box matching cannot.
'use strict'; const op = require('/MarkLogic/optic'); const result=op.fromView('regions', 'towns') .where(op.geo.coveredBy(op.col('interiorPoint'),geo.boxPolygon(cts.box(30.293547827697072,-88.41804353922988,35.04469652065227,-85.01228182047988)))) .orderBy('geoid') .result() result;
Aliceville, McMullen, and Petrey are within the box polygon specified. The more regions in our database, the greater the difference in performance between box matching vs. polygon matching.
There are a collection of functions that Geo in Optic does NOT optimize. Avoid calling the following in SQL, SPARQL, and Optic at all costs.
geo.boxIntersects()
geo.circleIntersects()
geo.complexPolygonContains()
geo.complexPolygonIntersects()
geo.polygonContains()
geo.polygonIntersects()
geo.regionContains()
geo.regionIntersects()
geo.regionRelate()
You can specify region
as the scalarType
for a TDE column. The region
type encompasses all geometries. Points, boxes, circles, linestrings, polygons, and complex polygons can be stored in a region
column. To use this facility, do not use a region constructor in the val
of your TDE column. This implies that within the document, the val
to be extracted must be a string that is internally parseable as a region.
Assume we have this TDE, where the exactGeometry column has a region
scalarType.
'use strict'; declareUpdate(); var tde = require("/MarkLogic/tde.xqy"); let node = { "template": { "description": "region table", "context": "/Placemark", "rows": [ { "schemaName": "Regions", "viewName": "Places", "columns": [ { "name": "geoid", "scalarType": "int", "val": "geoid", "nullable" : false }, { "name": "name", "scalarType": "string", "val": "name" }, { "name": "exactGeometry", "scalarType": "region", "val": "exactGeometry", "invalidValues": "ignore", "coordinateSystem": "wgs84" } ] } ] } } tde.templateInsert('Places.tdej', node)
And the following two documents are in the database
'use strict'; declareUpdate(); let node = xdmp.unquote( `{ "Placemark": { "name": "Panorama Heights", "geoid": 655506, "exactGeometry": "POLYGON((-118.63555 35.809731,-118.62843 35.809727,-118.62845 35.810692,-118.6284 35.810898,-118.61921 35.811218,-118.61917 35.808052,-118.61921 35.803162,-118.62955 35.802883,-118.6355 35.803001,-118.63566 35.80822,-118.63555 35.809731))" } }`) xdmp.documentInsert('PanoramaHeights.json', node) // ^ PanoramaHeights.json
'use strict'; declareUpdate(); let node = xdmp.unquote( `{ "Placemark": { "name": "Poso Fire Station", "geoid": 655507, "exactGeometry": "POINT(-118.6330952882809 35.80627543880975)" } }`) xdmp.documentInsert('PosoFireStation.json', node) // ^ PosoFireStation.json
We should get a Polygon and a Point in the same column of the Places
TDE view, which is observable by running the query below.
'use strict'; xdmp.sql('select * from places')
And indeed, we see Poso Fire Station’s and Panorama Height’s point and polygon data, respectively, in the same column.
When used effectively, a query-based view can help focus on the data that matters most. QBVs are at their best when created against already indexed data, and when the query that created them is drilled-down to the exact business need. Along with MarkLogic 11 and Geo in Optic comes the flexibility to define any geospatial QBV column as a region
, akin to the TDE method above.
Consider the example in the OpenGIS support
section:
const view2ColDescription = [ { "name": "tenKmRadiusPolygon", "type": "region", "invalid-values": "reject", "coordinate-system": "wgs84" } ] const view2 = op.fromView('regions','towns') .bind(op.as('tenKmRadiusPolygon', op.geo.circlePolygon(op.col('tenKmRadius'), 0.01, 'tolerance=0.001'))) .select([op.col('geoid'), op.col('tenKmRadiusPolygon')]) .generateView('Towns', 'CirclePolygonView', view2ColDescription) xdmp.eval('declareUpdate(); \ xdmp.documentInsert("circlePolygonQBV.xml", view, \ {collections: "http://marklogic.com/xdmp/qbv"})', {view: view2}, { database: xdmp.database('Schemas') });
We defined view2 as a view that has a geoid and a polygon column that is actually a circle. We can take this a step further, and set thetype
in the column description as a generic region.
If we do this, any geometry or geography will be accepted as a column in this query-based view. With this capability, we can render just about any geometry in a single layer in our OpenGIS tool.
XDMP-GEOHASH-TOLERANCE
This error is thrown if a geometry cannot be geohashed either during index or query. See here for more details about geohashing.
During ingest testing, we’ve found that this error is likely to be thrown if we try to index a region whose latitude is too close to the poles. If you have regions whose latitude is greater than 85 or less than -85, try increasing the triple index geohash precision setting on the Database (requires a reindex). If this error is still thrown after, you may not be able to store this region in a geographic coordinate system (e.g. wgs84).
During query testing, we’ve found that this error is likely to be thrown if the ‘tolerance=’ option in a call to a geo relate function is too high. Try lowering this setting in your query. If this does not work or is not present, a region in your query is likely to be too close to the poles. If you have regions whose latitude is greater than 85 or less than -85, try increasing the triple index geohash precision setting on the Database (requires a reindex). If this error is still thrown after, you may not be able to query with this region in a geographic coordinate system (e.g. wgs84).
XDMP-FRAGTOOLARGE
This error is thrown when a document and its index content cannot fit in memory. If you are running into this issue while ingesting regions in TDE or sem:triple()
, increase your in-memory triple index size and in-memory geospatial region index size in your Database settings. If you are still running into this issue after increasing these settings, this error can be caused by massive geographic (e.g. wgs84) regions being inserted. Try decreasing the triple index geohash precision setting on your Database (requires a reindex).
My geometries are showing up as -90,0 or POINT(0,-90)
If your geometries are unexpectedly being indexed with points as -90,0, they are getting clipped as they are out of the bounds of the geographic coordinate system. Double-check that your method of ingestion has latitude and longitude in the correct order.
Remember that in MarkLogic’s internal serialization, points are ingested and output in (latitude, longitude), but in WKT, GeoJSON, and KML, points are (longitude latitude).
By continuing to use this website you are giving consent to cookies being used in accordance with the MarkLogic Privacy Statement.