Spatial#
Functions that operate on spatial data.
Spatial Indexing Functions#
- cuspatial.quadtree_on_points(points: GeoSeries, x_min, x_max, y_min, y_max, scale, max_depth, max_size)#
Construct a quadtree from a set of points for a given area-of-interest bounding box.
- Parameters:
- points
Series of points.
- x_min
The lower-left x-coordinate of the area of interest bounding box.
- x_max
The upper-right x-coordinate of the area of interest bounding box.
- y_min
The lower-left y-coordinate of the area of interest bounding box.
- y_max
The upper-right y-coordinate of the area of interest bounding box.
- scale
Scale to apply to each point’s distance from
(x_min, y_min)
.- max_depth
Maximum quadtree depth.
- max_size
Maximum number of points allowed in a node before it’s split into 4 leaf nodes.
- Returns:
- resulttuple (cudf.Series, cudf.DataFrame)
- keys_to_pointscudf.Series(dtype=np.int32)
A column of sorted keys to original point indices
- quadtreecudf.DataFrame
A complete quadtree for the set of input points
- keycudf.Series(dtype=np.int32)
An int32 column of quadrant keys
- levelcudf.Series(dtype=np.int8)
An int8 column of quadtree levels
- is_internal_nodecudf.Series(dtype=np.bool_)
A boolean column indicating whether the node is a quad or leaf
- lengthcudf.Series(dtype=np.int32)
If this is a non-leaf quadrant (i.e.
is_internal_node
isTrue
), this column’s value is the number of children in the non-leaf quadrant.Otherwise this column’s value is the number of points contained in the leaf quadrant.
- offsetcudf.Series(dtype=np.int32)
If this is a non-leaf quadrant (i.e.
is_internal_node
isTrue
), this column’s value is the position of the non-leaf quadrant’s first child.Otherwise this column’s value is the position of the leaf quadrant’s first point.
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y
2D coordinates are converted into a 1D Morton code by dividing each x/y by the
scale
: ((x - min_x) / scale
and(y - min_y) / scale
).max_depth should be less than 16, since Morton codes are represented as uint32_t. The eventual number of levels may be less than max_depth if the number of points is small or max_size is large.
All intermediate quadtree nodes will have fewer than max_size number of points. Leaf nodes are permitted (but not guaranteed) to have >= max_size number of points.
Examples
An example of selecting the
max_size
andscale
based on input:>>> np.random.seed(0) >>> points = cudf.DataFrame({ "x": cudf.Series(np.random.normal(size=120)) * 500, "y": cudf.Series(np.random.normal(size=120)) * 500, }) >>> max_depth = 3 >>> max_size = 50 >>> min_x, min_y, max_x, max_y = (points["x"].min(), points["y"].min(), points["x"].max(), points["y"].max()) >>> scale = max(max_x - min_x, max_y - min_y) // (1 << max_depth) >>> print( "max_size: " + str(max_size) + "\n" "num_points: " + str(len(points)) + "\n" "min_x: " + str(min_x) + "\n" "max_x: " + str(max_x) + "\n" "min_y: " + str(min_y) + "\n" "max_y: " + str(max_y) + "\n" "scale: " + str(scale) + "\n" ) max_size: 50 num_points: 120 min_x: -1577.4949079170394 max_x: 1435.877311993804 min_y: -1412.7015761122134 max_y: 1492.572387431971 scale: 301.0 >>> key_to_point, quadtree = cuspatial.quadtree_on_points( points["x"], points["y"], min_x, max_x, min_y, max_y, scale, max_depth, max_size ) >>> print(quadtree) key level is_internal_node length offset 0 0 0 False 15 0 1 1 0 False 27 15 2 2 0 False 12 42 3 3 0 True 4 8 4 4 0 False 5 106 5 6 0 False 6 111 6 9 0 False 2 117 7 12 0 False 1 119 8 12 1 False 22 54 9 13 1 False 18 76 10 14 1 False 9 94 11 15 1 False 3 103 >>> print(key_to_point) 0 63 1 20 2 33 3 66 4 19 ... 115 113 116 3 117 78 118 98 119 24 Length: 120, dtype: int32
Spatial Join Functions#
- cuspatial.point_in_polygon(points: GeoSeries, polygons: GeoSeries)#
Compute from a set of points and a set of polygons which points fall within which polygons. Note that polygons_(x,y) must be specified as closed polygons: the first and last coordinate of each polygon must be the same.
- Parameters:
- pointsGeoSeries
A Series of points to test
- polygons: GeoSeries
A Series of polygons to test
- Returns:
- resultcudf.DataFrame
A DataFrame of boolean values indicating whether each point falls within each polygon.
Examples
Test whether 3 points fall within either of two polygons
>>> result = cuspatial.point_in_polygon( GeoSeries([Point(0, 0), Point(-8, -8), Point(6, 6)]), GeoSeries([ Polygon([(-10, -10), (5, -10), (5, 5), (-10, 5), (-10, -10)]), Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]) ], index=['nyc', 'hudson river']) ) # The result of point_in_polygon is a DataFrame of Boolean # values indicating whether each point (rows) falls within # each polygon (columns). >>> print(result) nyc hudson river 0 True True 1 True False 2 False True # Point 0: (0, 0) falls in both polygons # Point 1: (-8, -8) falls in the first polygon # Point 2: (6.0, 6.0) falls in the second polygon
- cuspatial.quadtree_point_in_polygon(poly_quad_pairs, quadtree, point_indices, points: GeoSeries, polygons: GeoSeries)#
Test whether the specified points are inside any of the specified polygons.
Uses the table of (polygon, quadrant) pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
to ensure only the points in the same quadrant as each polygon are tested for intersection.This pre-filtering can dramatically reduce number of points tested per polygon, enabling faster intersection-testing at the expense of extra memory allocated to store the quadtree and sorted point_indices.
- Parameters:
- poly_quad_pairs: cudf.DataFrame
Table of (polygon, quadrant) index pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
.- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- point_indicescudf.Series
Sorted point indices returned by
cuspatial.quadtree_on_points
- points: GeoSeries
Points used to build the quadtree
- polygons: GeoSeries
Polygons to test against
- Returns:
- resultcudf.DataFrame
Indices for each intersecting point and polygon pair.
- polygon_indexcudf.Series
Index of containing polygon.
- point_indexcudf.Series
Index of contained point. This index refers to
point_indices
, so it is an index to an index.
- cuspatial.quadtree_point_to_nearest_linestring(linestring_quad_pairs, quadtree, point_indices, points: GeoSeries, linestrings: GeoSeries)#
Finds the nearest linestring to each point in a quadrant, and computes the distances between each point and linestring.
Uses the table of (linestring, quadrant) pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
to ensure distances are computed only for the points in the same quadrant as each linestring.- Parameters:
- linestring_quad_pairs: cudf.DataFrame
Table of (linestring, quadrant) index pairs returned by
cuspatial.join_quadtree_and_bounding_boxes
.- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- point_indicescudf.Series
Sorted point indices returned by
cuspatial.quadtree_on_points
- points: GeoSeries
Points to find nearest linestring for
- linestrings: GeoSeries
Linestrings to test for
- Returns:
- resultcudf.DataFrame
Indices for each point and its nearest linestring, and the distance between the two.
- point_indexcudf.Series
Index of point. This index refers to
point_indices
, so it is an index to an index.- linestring_indexcudf.Series
Index of the nearest linestring to the point.
- distancecudf.Series
Distance between point and its nearest linestring.
- cuspatial.join_quadtree_and_bounding_boxes(quadtree, bounding_boxes, x_min, x_max, y_min, y_max, scale, max_depth)#
Search a quadtree for polygon or linestring bounding box intersections.
- Parameters:
- quadtreecudf.DataFrame
A complete quadtree for a given area-of-interest bounding box.
- bounding_boxescudf.DataFrame
Minimum bounding boxes for a set of polygons or linestrings
- x_min
The lower-left x-coordinate of the area of interest bounding box
- x_max
The upper-right x-coordinate of the area of interest bounding box
- min_y
The lower-left y-coordinate of the area of interest bounding box
- max_y
The upper-right y-coordinate of the area of interest bounding box
- scale
Scale to apply to each point’s distance from
(x_min, y_min)
- max_depth
Maximum quadtree depth at which to stop testing for intersections
- Returns:
- resultcudf.DataFrame
Indices for each intersecting bounding box and leaf quadrant.
- bbox_offsetcudf.Series
Indices for each bbox that intersects with the quadtree.
- quad_offsetcudf.Series
Indices for each leaf quadrant intersecting with a poly bbox.
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y
Measurement Functions#
- cuspatial.directed_hausdorff_distance(multipoints: GeoSeries)#
Compute the directed Hausdorff distances between all pairs of spaces.
- Parameters:
- multipoints: GeoSeries
A column of multipoint, where each multipoint indicates an input space to compute its hausdorff distance to the rest of input spaces.
- Returns:
- resultcudf.DataFrame
result[i, j] indicates the hausdorff distance between multipoints[i] and multipoint[j].
Examples
The directed Hausdorff distance from one space to another is the greatest of all the distances between any point in the first space to the closest point in the second.
Consider a pair of lines on a grid:
: x -----xyy--- : :
x0 = (0, 0), x1 = (0, 1)
y0 = (1, 0), y1 = (2, 0)
x0 is the closest point in
x
toy
. The distance from x0 to the farthest point iny
is 2.y0 is the closest point in
y
tox
. The distance from y0 to the farthest point inx
is 1.414.Compute the directed hausdorff distances between a set of spaces
>>> pts = cuspatial.GeoSeries([ ... MultiPoint([(0, 0), (1, 0)]), ... MultiPoint([(0, 1), (0, 2)]) ... ]) >>> cuspatial.directed_hausdorff_distance(pts) 0 1 0 0.0 1.414214 1 2.0 0.000000
- cuspatial.haversine_distance(p1: GeoSeries, p2: GeoSeries)#
Compute the haversine distances in kilometers between an arbitrary list of lon/lat pairs
- Parameters:
- p1: GeoSeries
Series of points as floats
- p2: GeoSeries
Series of points as floats
- Returns:
- resultcudf.Series
The distance between pairs of points between p1 and p2
>>> import cudf ..
>>> import cuspatial ..
>>> a = {"latitude":[0.0,0.0,1.0,1.0], ..
- … “longitude”: [0.0,1.0,0.0,1.0]}
>>> df = cudf.DataFrame(data=a) ..
>>> # Create cuSpatial GeoSeries from cuDF Dataframe ..
>>> gs = cuspatial.GeoSeries.from_points_xy( ..
- … df[[‘longitude’, ‘latitude’]].interleave_columns()
- … )
>>> # Create Comparator cuSpatial GeoSeries from a comparator point ..
>>> df['compare_lat'] = 2.0 # this will broadcast the value to all rows ..
>>> df['compare_lng'] = 2.0 ..
>>> cmp_gs = cuspatial.GeoSeries.from_points_xy( ..
- … df[[‘compare_lat’, ‘compare_lng’]].interleave_columns()
- … )
>>> # Calculate Haversine Distance of cuDF dataframe to comparator point ..
>>> df['compare_dist'] = cuspatial.haversine_distance(gs, cmp_gs) ..
>>> df.head() latitude longitude compare_lat compare_lng compare_dist
- 0 0.0 0.0 2.0 2.0 314.474805
- 1 0.0 1.0 2.0 2.0 248.629315
- 2 1.0 0.0 2.0 2.0 248.568719
- 3 1.0 1.0 2.0 2.0 157.225432
- cuspatial.pairwise_point_distance(points1: GeoSeries, points2: GeoSeries)#
Compute distance between (multi)points-(multi)points pairs
Currently points1 and points2 must contain either only points or multipoints. Mixing points and multipoints in the same series is unsupported.
- Parameters:
- points1GeoSeries
A GeoSeries of (multi)points
- points2GeoSeries
A GeoSeries of (multi)points
- Returns:
- distancecudf.Series
the distance between each pair of (multi)points
Examples
>>> from shapely.geometry import Point, MultiPoint >>> p1 = cuspatial.GeoSeries([ ... MultiPoint([(0.0, 0.0), (1.0, 0.0)]), ... MultiPoint([(0.0, 1.0), (1.0, 0.0)]) ... ]) >>> p2 = cuspatial.GeoSeries([ ... Point(2.0, 2.0), Point(0.0, 0.5) ... ]) >>> cuspatial.pairwise_point_distance(p1, p2) 0 2.236068 1 0.500000 dtype: float64
- cuspatial.pairwise_linestring_distance(multilinestrings1: GeoSeries, multilinestrings2: GeoSeries)#
Compute distance between (multi)linestring-(multi)linestring pairs
The shortest distance between two linestrings is defined as the shortest distance between all pairs of segments of the two linestrings. If any of the segments intersect, the distance is 0.
- Parameters:
- multilinestrings1GeoSeries
A GeoSeries of (multi)linestrings
- multilinestrings2GeoSeries
A GeoSeries of (multi)linestrings
- Returns:
- distancecudf.Series
the distance between each pair of linestrings
Examples
>>> from shapely.geometry import LineString, MultiLineString >>> ls1 = cuspatial.GeoSeries([ ... LineString([(0, 0), (1, 1)]), ... LineString([(1, 0), (2, 1)]) ... ]) >>> ls2 = cuspatial.GeoSeries([ ... MultiLineString([ ... LineString([(-1, 0), (-2, -1)]), ... LineString([(-2, -1), (-3, -2)]) ... ]), ... MultiLineString([ ... LineString([(0, -1), (0, -2), (0, -3)]), ... LineString([(0, -3), (-1, -3), (-2, -3)]) ... ]) ... ]) >>> cuspatial.pairwise_linestring_distance(ls1, ls2) 0 1.000000 1 1.414214 dtype: float64
- cuspatial.pairwise_point_linestring_distance(points: GeoSeries, linestrings: GeoSeries)#
Compute distance between (multi)points-(multi)linestrings pairs
The distance between a (multi)point and a (multi)linestring is defined as the shortest distance between every point in the multipoint and every line segment in the multilinestring.
This algorithm computes distance pairwise. The ith row in the result is the distance between the ith (multi)point in points and the ith (multi)linestring in linestrings.
- Parameters:
- pointsGeoSeries
The (multi)points to compute the distance from.
- linestringsGeoSeries
The (multi)linestrings to compute the distance from.
- Returns:
- distancecudf.Series
Notes
The input GeoSeries must contain a single type geometry. For example, points series cannot contain both points and polygons. Currently, it is unsupported that points contains both points and multipoints.
Examples
Compute distances between point array to linestring arrays
>>> from shapely.geometry import ( ... Point, MultiPoint, LineString, MultiLineString ... ) >>> import geopandas as gpd, cuspatial >>> pts = cuspatial.from_geopandas(gpd.GeoSeries([ ... Point(0.0, 0.0), Point(1.0, 1.0), Point(2.0, 2.0) ... ])) >>> mlines = cuspatial.from_geopandas(gpd.GeoSeries( ... [ ... LineString([Point(-1.0, 0.0), ... Point(-0.5, -0.5), ... Point(-1.0, -0.5), ... Point(-0.5, -1.0)]), ... LineString([Point(8.0, 10.0), ... Point(11.21, 9.48), ... Point(7.1, 12.5)]), ... LineString([Point(1.0, 0.0), Point(0.0, 1.0)]), ... ])) >>> cuspatial.pairwise_point_linestring_distance(pts, mlines) 0 0.707107 1 11.401754 2 2.121320 dtype: float64
Compute distances between multipoint to multilinestring arrays >>> # 3 pairs of multi points containing 3 points each >>> ptsdata = [ … [[9, 7], [0, 6], [7, 2]], … [[5, 8], [5, 7], [6, 0]], … [[8, 8], [6, 7], [4, 1]], … ] >>> # 3 pairs of multi linestrings containing 2 linestrings each >>> linesdata = [ … [ … [[86, 47], [31, 17], [84, 16], [14, 63]], … [[14, 36], [90, 73], [72, 66], [0, 5]], … ], … [ … [[36, 90], [29, 31], [91, 70], [25, 78]], … [[61, 64], [89, 20], [94, 46], [37, 44]], … ], … [ … [[60, 76], [29, 60], [53, 87], [8, 18]], … [[0, 16], [79, 14], [3, 6], [98, 89]], … ], … ] >>> pts = cuspatial.from_geopandas( … gpd.GeoSeries(map(MultiPoint, ptsdata)) … ) >>> lines = cuspatial.from_geopandas( … gpd.GeoSeries(map(MultiLineString, linesdata)) … ) >>> cuspatial.pairwise_point_linestring_distance(pts, lines) 0 0.762984 1 33.241540 2 0.680451 dtype: float64
Nearest Points Function#
- cuspatial.pairwise_point_linestring_nearest_points(points: GeoSeries, linestrings: GeoSeries) GeoDataFrame #
Returns the nearest points between two GeoSeries of points and linestrings.
Multipoints and Multilinestrings are also supported. With restriction that the points series must contain either only points or only multipoints.
- Parameters:
- pointsGeoSeries
A GeoSeries of points. Currently either only a series of points or a series of multipoints is supported.
- linestringsGeoSeries
A GeoSeries of linestrings or multilinestrings.
- Returns:
- GeoDataFrame
A GeoDataFrame with four columns.
“point_geometry_id” contains index of the nearest point in the row. If points consists of single-points, it is always 0.
“linestring_geometry_id” contains the index of the linestring in the multilinestring that contains the nearest point.
“segment_id” contains the index of the segment in the linestring that contains the nearest point.
“geometry” contains the points of the nearest point on the linestring.
Bounding Boxes#
- cuspatial.polygon_bounding_boxes(polygons: GeoSeries)#
Compute the minimum bounding-boxes for a set of polygons.
- Parameters:
- polygons: GeoSeries
A series of polygons
- Returns:
- resultcudf.DataFrame
minimum bounding boxes for each polygon
- minxcudf.Series
the minimum x-coordinate of each bounding box
- minycudf.Series
the minimum y-coordinate of each bounding box
- maxxcudf.Series
the maximum x-coordinate of each bounding box
- maxycudf.Series
the maximum y-coordinate of each bounding box
Notes
Has no notion of multipolygons. If a multipolygon is passed, the bounding boxes for each polygon will be computed and returned. The user is responsible for handling the multipolygon case.
- cuspatial.linestring_bounding_boxes(linestrings: GeoSeries, expansion_radius: float)#
Compute the minimum bounding boxes for a set of linestrings.
- Parameters:
- linestrings: GeoSeries
A series of linestrings
- expansion_radius
radius of each linestring point
- Returns:
- resultcudf.DataFrame
minimum bounding boxes for each linestring
- minxcudf.Series
the minimum x-coordinate of each bounding box
- minycudf.Series
the minimum y-coordinate of each bounding box
- maxxcudf.Series
the maximum x-coordinate of each bounding box
- maxycudf.Series
the maximum y-coordinate of each bounding box
Projection Functions#
- cuspatial.sinusoidal_projection(origin_lon, origin_lat, lonlat: GeoSeries)#
Sinusoidal projection of longitude/latitude relative to origin to Cartesian (x/y) coordinates in km.
Can be used to approximately convert longitude/latitude coordinates to Cartesian coordinates given that all points are near the origin. Error increases with distance from the origin. Results are scaled relative to the size of the Earth in kilometers. See https://en.wikipedia.org/wiki/Sinusoidal_projection for more detail.
- Parameters:
- origin_lon
number
longitude offset (this is subtracted from each input before converting to x,y)
- origin_lat
number
latitude offset (this is subtracted from each input before converting to x,y)
- lonlat: GeoSeries
A GeoSeries of Points that contains the longitude and latitude to transform
- origin_lon
- Returns:
- resultGeoSeries
A GeoSeries that contains the transformed coordinates.
Spatial Filtering Functions#
- cuspatial.points_in_spatial_window(points: GeoSeries, min_x, max_x, min_y, max_y)#
Return only the subset of coordinates that fall within a rectangular window.
A point (x, y) is inside the query window if and only if
min_x < x < max_x AND min_y < y < max_y
The window is specified by minimum and maximum x and y coordinates.
- Parameters:
- points: GeoSeries
A geoseries of points
- min_x: float
lower x-coordinate of the query window
- max_x: float
upper x-coordinate of the query window
- min_y: float
lower y-coordinate of the query window
- max_y: float
upper y-coordinate of the query window
- Returns:
- resultGeoSeries
subset of points above that fall within the window
Notes
Swaps
min_x
andmax_x
ifmin_x > max_x
Swaps
min_y
andmax_y
ifmin_y > max_y