Duplicate and overlapping geometries

get_intersections(gdf, geom_type=None, keep_geom_type=None, predicate='intersects', n_jobs=1)[source]

Find geometries that intersect in a GeoDataFrame.

Does an intersection with itself and keeps only the geometries that appear more than once.

Note that the returned GeoDataFrame in most cases contain two rows per intersection pair. It might also contain more than two overlapping polygons if there were multiple overlapping. These can be removed with update_geometries. See example below.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame of polygons.

  • geom_type (str | None) – Optionally specify which geometry type to keep. Either “polygon”, “line” or “point”.

  • keep_geom_type (bool | None) – Whether to keep the original geometry type. If mixed geometry types and keep_geom_type=True, an exception is raised.

  • n_jobs (int) – Number of threads.

  • predicate (str | None) – Spatial predicate for the spatial tree.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame of the overlapping polygons.

Examples:

Create three partially overlapping polygons.

>>> import sgis as sg
>>> circles = sg.to_gdf([(0, 0), (1, 0), (2, 0)]).pipe(sg.buff, 1.2)
>>> circles.area
0    4.523149
1    4.523149
2    4.523149
dtype: float64

Get the duplicates.

>>> duplicates = sg.get_intersections(circles)
>>> duplicates["area"] = duplicates.area
>>> duplicates
                                            geometry      area
0  POLYGON ((1.19941 -0.03769, 1.19763 -0.07535, ...  2.194730
0  POLYGON ((1.19941 -0.03769, 1.19763 -0.07535, ...  0.359846
1  POLYGON ((0.48906 -1.08579, 0.45521 -1.06921, ...  2.194730
1  POLYGON ((2.19941 -0.03769, 2.19763 -0.07535, ...  2.194730
2  POLYGON ((0.98681 -0.64299, 0.96711 -0.61085, ...  0.359846
2  POLYGON ((1.48906 -1.08579, 1.45521 -1.06921, ...  2.194730

We get two rows for each intersection pair.

To get no overlapping geometries without , we can put geometries on top of each other rowwise.

>>> updated = sg.update_geometries(duplicates)
>>> updated["area"] = updated.area
>>> updated
       area                                           geometry
0  2.194730  POLYGON ((1.19941 -0.03769, 1.19763 -0.07535, ...
1  1.834884  POLYGON ((2.19763 -0.07535, 2.19467 -0.11293, ...

It might be appropriate to sort the dataframe by columns. Or put large polygons first and NaN values last.

>>> updated = (
...     sg.sort_large_first(duplicates)
...     .pipe(sg.sort_nans_last)
...     .pipe(sg.update_geometries)
... )
>>> updated
      area                                           geometry
0  2.19473  POLYGON ((1.19941 -0.03769, 1.19763 -0.07535, ...
1  2.19473  POLYGON ((2.19763 -0.07535, 2.19467 -0.11293, ...
update_geometries(gdf, geom_type=None, keep_geom_type=None, grid_size=None, n_jobs=1, predicate='intersects')[source]

Puts geometries on top of each other rowwise.

Since this operation is done rowwise, it’s important to first sort the GeoDataFrame approriately. See example below.

Parameters:
  • gdf (GeoDataFrame) – The GeoDataFrame to be updated.

  • keep_geom_type (bool | None) – If True, return only geometries of original type in case of intersection resulting in multiple geometry types or GeometryCollections. If False, return all resulting geometries (potentially mixed types).

  • geom_type (str | None) – Optionally specify what geometry type to keep., if there are mixed geometry types. Must be either “polygon”, “line” or “point”.

  • grid_size (int | None) – Precision grid size to round the geometries. Will use the highest precision of the inputs by default.

  • n_jobs (int) – Number of threads.

  • predicate (str | None) – Spatial predicate for the spatial tree.

Return type:

GeoDataFrame

Example:

Create two circles and get the overlap.

>>> import sgis as sg
>>> circles = sg.to_gdf([(0, 0), (1, 1)]).pipe(sg.buff, 1)
>>> duplicates = sg.get_intersections(circles)
>>> duplicates
   idx                                           geometry
0    1  POLYGON ((0.03141 0.99951, 0.06279 0.99803, 0....
1    2  POLYGON ((1.00000 0.00000, 0.96859 0.00049, 0....

The polygons are identical except for the order of the coordinates.

>>> poly1, poly2 = duplicates.geometry
>>> poly1.equals(poly2)
True

‘update_geometries’ gives different results based on the order of the GeoDataFrame.

>>> sg.update_geometries(duplicates)
    idx                                           geometry
0    1  POLYGON ((0.03141 0.99951, 0.06279 0.99803, 0....
>>> dups_rev = duplicates.iloc[::-1]
>>> sg.update_geometries(dups_rev)
    idx                                           geometry
1    2  POLYGON ((1.00000 0.00000, 0.96859 0.00049, 0....

It might be appropriate to put the largest polygons on top and sort all NaNs to the bottom.

>>> updated = (
...     sg.sort_large_first(duplicates)
...     .pipe(sg.sort_nans_last)
...     .pipe(sg.update_geometries)
>>> updated
    idx                                           geometry
0    1  POLYGON ((0.03141 0.99951, 0.06279 0.99803, 0....