[MarkLogic Dev General] Polygon intersection over large document collections

Mary Holstege mary.holstege at marklogic.com
Fri Mar 13 11:31:10 PDT 2015

> So if I'm interested in finding documents whose "location" (Point,  
> Polygon or
> LineString) intersects a given box I can run this type of query:
> cts:search(doc(), cts:path-geospatial-query("location/coordinates",  
> cts:box(-10,
> -10, 10, 10), "type=long-lat-point"))[1 to 10]
> This works when at least one of the vertices of the Polygon is within  
> the box
> but if the box is wholly within the polygon it doesn't detect the  
> intersection.
> Of course I can convert the "location[type="Polygon"]/coordinates"  
> property to a
> cts:polygon() (or cts:complex-polygon()) to test for intersection but  
> it's a
> very costly operation over millions of documents.
> Are there alternative approaches to performing this type of polygon  
> intersection
> over large data sets?
> (In particular anything that would leverage MarkLogic indexes).

General region/region intersection is pretty tricky. There have been a  
number of projects are field has worked on where they have used  
combinations of forwards and reverse query to limit the number of regions  
that need to be considered, followed by a filtering pass with  
cts:polygon-intersects or the like.  The trick with reverse query is that  
if you store a region as a query, then you can see which regions match a  
particular point.

If you're only interested in boxes, you can do things a little more  
simply, although you'll still need to augment the data and do some  
precomputation. cts:bounding-boxes gives you bounding boxes for regions.  
For polygons (and linestrings) you can specify a box-percent=xx option  
that tells you how finely to slice up the polygon. If you set it to 0  
you'lll get 1 bounding box (per hemisphere) and if you set it to 100  
you'll get the full set of boxes that together cover the polygon. If you  
store the bounding boxes as queries and enable the reverse index, you can  
query for bounding boxes that contain a corner of your query box to get a  
candidate set of overlapping regions, and then run the intersects method  
to filter than down the the real set.


More information about the General mailing list