Neighbors¶

Get neighbors and K-nearest neighbors.

The functions rely on the pandas index as identifiers for the geometries and their neighbors. This makes it easy to join or aggregate the results onto the input GeoDataFrames.

The results of all functions will be identical with GeoDataFrame and GeoSeries as input types.

get_all_distances(gdf, neighbors)[source]¶

Get distances from ‘gdf’ to all points in ‘neighbors’.

Find the distance from each point in ‘gdf’ to each point in ‘neighbors’. Also returns the indices of ‘neighbors’ (as the column ‘neighbor_index’) and ‘gdf’ (as the index).

Parameters:

gdf (GeoDataFrame | GeoSeries) – a GeoDataFrame of points.
neighbors (GeoDataFrame | GeoSeries) – a GeoDataFrame of points.

Return type:

DataFrame

Returns:

DataFrame with the columns ‘neighbor_index’ and ‘distance’. The index follows the index of ‘gdf’.

Raises:

ValueError – If the coordinate reference system of ‘gdf’ and ‘neighbors’ are not the same.

Examples:¶

>>> from sgis import get_all_distances, random_points
>>> points = random_points(100)
>>> neighbors = random_points(100)

>>> distances = get_all_distances(points, neighbors)
>>> distances
    neighbor_index  distance
             70  0.050578
             24  0.070267
             91  0.088510
             72  0.095352
             40  0.103720
..             ...       ...
            27  0.713055
            60  0.718162
            63  0.719675
            62  0.719747
            90  0.761324

[10000 rows x 2 columns]

Use set_index to get values from other columns.

>>> neighbors["custom_id"] = np.random.choice([*"abcde"], len(neighbors))
>>> distances = get_all_distances(points, neighbors.set_index("custom_id"))
   neighbor_index  distance
             d  0.050578
             b  0.070267
             e  0.088510
             d  0.095352
             c  0.103720
..            ...       ...
            b  0.713055
            d  0.718162
            d  0.719675
            d  0.719747
            e  0.761324

[10000 rows x 2 columns]

Since the index from ‘gdf’ is preserved, we can join the results with the ‘points’.

>>> joined = points.join(distances)
>>> joined
                   geometry neighbor_index  distance
 POINT (0.59809 0.34636)              d  0.050578
 POINT (0.59809 0.34636)              b  0.070267
 POINT (0.59809 0.34636)              e  0.088510
 POINT (0.59809 0.34636)              d  0.095352
 POINT (0.59809 0.34636)              c  0.103720
..                      ...            ...       ...
POINT (0.35305 0.47445)              b  0.713055
POINT (0.35305 0.47445)              d  0.718162
POINT (0.35305 0.47445)              d  0.719675
POINT (0.35305 0.47445)              d  0.719747
POINT (0.35305 0.47445)              e  0.761324

[10000 rows x 3 columns]

Or assign aggregated values onto the points.

>>> points["mean_distance"] = distances.groupby(level=0)["distance"].mean()
>>> points["min_distance"] = distances.groupby(level=0)["distance"].min()
>>> points
                   geometry  mean_distance  min_distance
0   POINT (0.59809 0.34636)       0.417128      0.050578
1   POINT (0.25444 0.02876)       0.673966      0.016781
2   POINT (0.22475 0.08637)       0.643514      0.030049
3   POINT (0.14814 0.23037)       0.593224      0.025758
4   POINT (0.69298 0.81931)       0.434355      0.051575
..                      ...            ...           ...
95  POINT (0.62453 0.26793)       0.460177      0.031749
96  POINT (0.11882 0.26615)       0.592930      0.044010
97  POINT (0.03998 0.77527)       0.592031      0.090983
98  POINT (0.46047 0.79056)       0.400134      0.016012
99  POINT (0.35305 0.47445)       0.397660      0.052134

[100 rows x 3 columns]

get_k_nearest_neighbors(gdf, neighbors, k, *, strict=False)[source]¶

Finds the k nearest neighbors for a GeoDataFrame of points.

Uses the K-nearest neighbors algorithm method from scikit-learn to find the given number of neighbors for each point in ‘gdf’. Identical points are considered neighbors. Preserves the index of ‘gdf’ and adds the column ‘neighbor_index’ with the indices of the neighbors, as well as a column ‘distance’.

Parameters:

gdf (GeoDataFrame | GeoSeries) – a GeoDataFrame of points
neighbors (GeoDataFrame | GeoSeries) – a GeoDataFrame of points
k (int) – number of neighbors to find.
strict (bool) – If False (default), no exception is raised if k is larger than the number of points in ‘neighbors’. If True, ‘k’ must be less than or equal to the number of points in ‘neighbors’.

Return type:

DataFrame

Returns:

DataFrame with the columns ‘neighbor_index’ and ‘distance’. The index follows the index of ‘gdf’.

Raises:

ValueError – If the coordinate reference system of ‘gdf’ and ‘neighbors’ are not the same.

Examples:¶

Make some random points.

>>> from sgis import get_k_nearest_neighbors, random_points
>>> points = random_points(100)
>>> neighbors = random_points(100)

Get 10 nearest neighbors.

>>> distances = get_k_nearest_neighbors(points, neighbors, k=10)
>>> distances
    neighbor_index  distance
             84  0.049168
             59  0.053592
             14  0.091812
             40  0.118403
             77  0.129565
..             ...       ...
            86  0.153771
            92  0.157481
            70  0.177368
            65  0.184087
            26  0.202216

[1000 rows x 2 columns]

Use set_index to use another column as identifier for the neighbors.

>>> neighbors["custom_id"] = [letter for letter in [*"abcde"] for _ in range(20)]
>>> distances = get_k_nearest_neighbors(points, neighbors.set_index("custom_id"), k=10)
>>> distances
   neighbor_index  distance
0               e  0.049168
0               c  0.053592
0               a  0.091812
0               c  0.118403
0               d  0.129565
..            ...       ...
99              e  0.153771
99              e  0.157481
99              d  0.177368
99              d  0.184087
99              b  0.202216

[1000 rows x 2 columns]

The index from ‘points’ is preserved. Use join to get the distance and neighbor index onto the ‘points’ GeoDataFrame.

>>> joined = points.join(distances)
>>> joined["k"] = joined.groupby(level=0)["distance"].transform("rank")
>>> joined
                   geometry neighbor_index  distance     k
0   POINT (0.89067 0.75346)              e  0.049168   1.0
0   POINT (0.89067 0.75346)              c  0.053592   2.0
0   POINT (0.89067 0.75346)              a  0.091812   3.0
0   POINT (0.89067 0.75346)              c  0.118403   4.0
0   POINT (0.89067 0.75346)              d  0.129565   5.0
..                      ...            ...       ...   ...
99  POINT (0.65910 0.16714)              e  0.153771   6.0
99  POINT (0.65910 0.16714)              e  0.157481   7.0
99  POINT (0.65910 0.16714)              d  0.177368   8.0
99  POINT (0.65910 0.16714)              d  0.184087   9.0
99  POINT (0.65910 0.16714)              b  0.202216  10.0

[1000 rows x 4 columns]

Or assign aggregated values directly onto the points.

>>> points["mean_distance"] = distances.groupby(level=0)["distance"].mean()
>>> points["min_distance"] = distances.groupby(level=0)["distance"].min()
>>> points
                   geometry  mean_distance  min_distance
0   POINT (0.89067 0.75346)       0.132193      0.049168
1   POINT (0.41308 0.09462)       0.116610      0.009072
2   POINT (0.13458 0.44248)       0.100539      0.059576
3   POINT (0.32670 0.44102)       0.117730      0.056133
4   POINT (0.82184 0.41231)       0.106685      0.013174
..                      ...            ...           ...
95  POINT (0.29706 0.27520)       0.137398      0.079672
96  POINT (0.42416 0.26956)       0.160817      0.074759
97  POINT (0.98337 0.54492)       0.164551      0.070798
98  POINT (0.42458 0.77459)       0.127562      0.027662
99  POINT (0.65910 0.16714)       0.143257      0.058453

[100 rows x 3 columns]

get_neighbor_indices(gdf, neighbors, max_distance=0, predicate='intersects', rtree_runner=None, n_jobs=None)[source]¶

Creates a pandas Series with the index of ‘gdf’ and values of ‘neighbors’.

Finds all the geometries in ‘neighbors’ that intersect with ‘gdf’ and returns a Series where the values are the ‘neighbors’ indices and the index is the indices of ‘gdf’. Use set_index inside the function call to get values from a column instead of the current index.

Parameters:

gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries.
neighbors (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries.
max_distance (int) – The maximum distance between the geometries. Defaults to 0.
predicate (str) – Spatial predicate to use in sjoin. Defaults to “intersects”, meaning the geometry itself and geometries within will be considered neighbors if they are part of the ‘neighbors’ GeoDataFrame.
rtree_runner (RTreeQueryRunner | None) – Optionally debug/manipulate the spatial indexing operations. See the ‘runners’ module for example implementations.
n_jobs (int | None) – Number of workers.

Return type:

Series

Returns:

A pandas Series with values of the intersecting ‘neighbors’ indices. The Series’ index will follow the index of ‘gdf’.

Raises:

ValueError – If gdf and neighbors do not have the same coordinate reference system.

Examples:¶

>>> from sgis import get_neighbor_indices, to_gdf
>>> points = to_gdf([(0, 0), (0.5, 0.5)])
>>> points
                geometry
0  POINT (0.00000 0.00000)
1  POINT (0.50000 0.50000)

With the default max_distance (0), the points return the index of themselves.

>>> neighbor_indices = get_neighbor_indices(points, points)
>>> neighbor_indices
0    0
1    1
Name: neighbor_index, dtype: int64

With max_distance=1, each point find themselves and the neighbor.

>>> neighbor_indices = get_neighbor_indices(points, points, max_distance=1)
>>> neighbor_indices
0    0
1    0
0    1
1    1
Name: neighbor_index, dtype: int64

Using a column instead of the index.

>>> points["text"] = [*"ab"]
>>> neighbor_indices = get_neighbor_indices(points, points.set_index("text"), max_distance=1)
>>> neighbor_indices
0    a
1    a
0    b
1    b
Name: neighbor_index, dtype: object

The returned Series will always keep the index of ‘gdf’ and have values of the ‘neighbors’ index.

>>> neighbor_indices.index
Int64Index([0, 1, 0, 1], dtype='int64')

>>> neighbor_indices.values
['a' 'a' 'b' 'b']

k_nearest_neighbors(from_array, to_array, k=None, strict=False)[source]¶

Finds nearest neighbors for an array of coordinates to another array of coordinates.

Parameters:

from_array (ndarray[ndarray[float]]) – Numpy array of coordinates.
to_array (ndarray[ndarray[float]]) – Numpy array of coordinates.
k (int | None) – Number of neighbors to find.
strict (bool) – If True (not default), an exception is raised if k is larger than the length of ‘to_array’.

Return type:

tuple[ndarray[float], ndarray[int]]

Returns a tuple of arrays, one with distances and one with indices: of the neighbors.

sjoin_within_distance(gdf, neighbors, distance, distance_col='distance', **kwargs)[source]¶

Sjoin with a buffer on the right GeoDataFrame and adds a distance column.

Return type:

GeoDataFrame

Parameters:

gdf (GeoDataFrame | GeoSeries)
neighbors (GeoDataFrame | GeoSeries)
distance (int | float)
distance_col (str)