[MarkLogic Dev General] Polygon intersection over large document collections
mary.holstege at marklogic.com
Fri Mar 13 11:31:10 PDT 2015
> So if I'm interested in finding documents whose "location" (Point,
> Polygon or
> LineString) intersects a given box I can run this type of query:
> cts:search(doc(), cts:path-geospatial-query("location/coordinates",
> -10, 10, 10), "type=long-lat-point"))[1 to 10]
> This works when at least one of the vertices of the Polygon is within
> the box
> but if the box is wholly within the polygon it doesn't detect the
> Of course I can convert the "location[type="Polygon"]/coordinates"
> property to a
> cts:polygon() (or cts:complex-polygon()) to test for intersection but
> it's a
> very costly operation over millions of documents.
> Are there alternative approaches to performing this type of polygon
> over large data sets?
> (In particular anything that would leverage MarkLogic indexes).
General region/region intersection is pretty tricky. There have been a
number of projects are field has worked on where they have used
combinations of forwards and reverse query to limit the number of regions
that need to be considered, followed by a filtering pass with
cts:polygon-intersects or the like. The trick with reverse query is that
if you store a region as a query, then you can see which regions match a
If you're only interested in boxes, you can do things a little more
simply, although you'll still need to augment the data and do some
precomputation. cts:bounding-boxes gives you bounding boxes for regions.
For polygons (and linestrings) you can specify a box-percent=xx option
that tells you how finely to slice up the polygon. If you set it to 0
you'lll get 1 bounding box (per hemisphere) and if you set it to 100
you'll get the full set of boxes that together cover the polygon. If you
store the bounding boxes as queries and enable the reverse index, you can
query for bounding boxes that contain a corner of your query box to get a
candidate set of overlapping regions, and then run the intersects method
to filter than down the the real set.
More information about the General