Polygon geometry operations

Functions for polygon geometries.

close_all_holes(gdf, *, ignore_islands=False, copy=True)[source]

Closes all holes in polygons.

It takes a GeoDataFrame or GeoSeries of polygons and returns the outer circle.

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries of polygons.

  • copy (bool) – if True (default), the input GeoDataFrame or GeoSeries is copied. Defaults to True.

  • ignore_islands (bool) – If False (default), polygons inside the holes (islands) will be erased from the output geometries. If True, the entire holes will be closed and the islands kept, meaning there might be duplicate surfaces in the resulting geometries. Note that ignoring islands is a lot faster.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries of polygons with closed holes in the geometry column.

Examples:

Let’s create a circle with a hole in it.

>>> point = sg.to_gdf([260000, 6650000], crs=25833)
>>> point
                        geometry
0  POINT (260000.000 6650000.000)
>>> circle = sg.buff(point, 1000)
>>> small_circle = sg.buff(point, 500)
>>> circle_with_hole = circle.overlay(small_circle, how="difference")
>>> circle_with_hole.area
0    2.355807e+06
dtype: float64

Close the hole.

>>> holes_closed = sg.close_all_holes(circle_with_hole)
>>> holes_closed.area
0    3.141076e+06
dtype: float64
close_small_holes(gdf, max_area, *, ignore_islands=False, copy=True)[source]

Closes holes in polygons if the area is less than the given maximum.

It takes a GeoDataFrame or GeoSeries of polygons and fills the holes that are smaller than the specified area given in units of either square meters (‘max_m2’) or square kilometers (‘max_km2’).

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries of polygons.

  • max_area (int | float) – The maximum area in the unit of the GeoDataFrame’s crs.

  • ignore_islands (bool) – If False (default), polygons inside the holes (islands) will be erased from the “hole” geometries before the area is calculated. If True, the entire polygon interiors will be considered, meaning there might be duplicate surfaces in the resulting geometries. Note that ignoring islands is a lot faster.

  • copy (bool) – if True (default), the input GeoDataFrame or GeoSeries is copied. Defaults to True.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries of polygons with closed holes in the geometry column.

Raises:
  • ValueError – If the coordinate reference system of the GeoDataFrame is not in meter units.

  • ValueError – If both ‘max_m2’ and ‘max_km2’ is given.

Examples:

Let’s create a circle with a hole in it.

>>> point = sg.to_gdf([260000, 6650000], crs=25833)
>>> point
                        geometry
0  POINT (260000.000 6650000.000)
>>> circle = sg.buff(point, 1000)
>>> small_circle = sg.buff(point, 500)
>>> circle_with_hole = circle.overlay(small_circle, how="difference")
>>> circle_with_hole.area
0    2.355807e+06
dtype: float64

Close holes smaller than 1 square kilometer (1 million square meters).

>>> holes_closed = sg.close_small_holes(circle_with_hole, max_area=1_000_000)
>>> holes_closed.area
0    3.141076e+06
dtype: float64

The hole will not be closed if it is larger.

>>> holes_closed = sg.close_small_holes(circle_with_hole, max_area=1_000)
>>> holes_closed.area
0    2.355807e+06
dtype: float64
eliminate_by_largest(gdf, to_eliminate, *, max_distance=None, remove_isolated=False, fix_double=False, ignore_index=False, aggfunc=None, predicate='intersects', grid_size=None, n_jobs=1, **kwargs)[source]

Dissolves selected polygons with the largest neighbor polygon.

Eliminates selected geometries by dissolving them with the neighboring polygon with the largest area. The index and column values of the large polygons will be kept, unless else is specified.

Parameters:
  • gdf (GeoDataFrame | list[GeoDataFrame]) – GeoDataFrame with polygon geometries, or a list of GeoDataFrames.

  • to_eliminate (GeoDataFrame) – The geometries to be eliminated by ‘gdf’.

  • max_distance (int | float | None) – Max distance to search for neighbors. Defaults to None, meaning 0.

  • remove_isolated (bool) – If False (default), polygons in ‘to_eliminate’ that share no border with any polygon in ‘gdf’ will be kept. If True, the isolated polygons will be removed.

  • fix_double (bool) – If True, geometries to be eliminated will be erased by overlapping geometries to not get double surfaces if the geometries in ‘to_eliminate’ overlaps with multiple geometries in ‘gdf’.

  • ignore_index (bool) – If False (default), the resulting GeoDataFrame will keep the index of the large polygons. If True, the resulting axis will be labeled 0, 1, …, n - 1.

  • aggfunc (str | dict | list | None) – Aggregation function(s) to use when dissolving/eliminating. Defaults to None, meaning the values of ‘gdf’ is used. Otherwise, aggfunc will be passed to pandas groupby.agg. note: The geometries of ‘gdf’ are sorted first, but if ‘gdf’ has missing values, the resulting polygons might get values from the polygons to be eliminated (if aggfunc=”first”).

  • predicate (str) – Binary predicate passed to sjoin. Defaults to “intersects”.

  • grid_size – Rounding of the coordinates. Defaults to None.

  • n_jobs (int) – Number of threads to use. Defaults to 1.

  • **kwargs – Keyword arguments passed to the dissolve method.

Return type:

GeoDataFrame | tuple[GeoDataFrame]

Returns:

The GeoDataFrame (gdf) with the geometries of ‘to_eliminate’ dissolved in. If multiple GeoDataFrame are passed as ‘gdf’, they are returned as a tuple.

Examples:

Create two polygons with a sliver in between:

>>> sliver = sg.to_gdf(Polygon([(0, 0), (0.1, 1), (0, 2), (-0.1, 1)]))
>>> small_poly = sg.to_gdf(
...     Polygon([(0, 0), (-0.1, 1), (0, 2), (-1, 2), (-2, 2), (-1, 1)])
... )
>>> large_poly = sg.to_gdf(
...     Polygon([(0, 0), (0.1, 1), (1, 2), (2, 2), (3, 2), (3, 0)])
... )

Using multiple GeoDataFrame as input, the sliver is eliminated into the large polygon.

>>> small_poly_eliminated, large_poly_eliminated = sg.eliminate_by_largest(
...     [small_poly, large_poly], sliver
... )

With only one input GeoDataFrame:

>>> polys = pd.concat([small_poly, large_poly])
>>> eliminated = sg.eliminate_by_largest(polys, sliver)
eliminate_by_longest(gdf, to_eliminate, *, remove_isolated=False, fix_double=True, ignore_index=False, aggfunc=None, grid_size=None, n_jobs=1, **kwargs)[source]

Dissolves selected polygons with the longest bordering neighbor polygon.

Eliminates selected geometries by dissolving them with the neighboring polygon with the longest shared border. The index and column values of the large polygons will be kept, unless else is specified.

Note that this might be a lot slower than eliminate_by_largest.

Parameters:
  • gdf (GeoDataFrame | list[GeoDataFrame]) – GeoDataFrame with polygon geometries, or a list of GeoDataFrames.

  • to_eliminate (GeoDataFrame) – The geometries to be eliminated by ‘gdf’.

  • remove_isolated (bool) – If False (default), polygons in ‘to_eliminate’ that share no border with any polygon in ‘gdf’ will be kept. If True, the isolated polygons will be removed.

  • fix_double (bool) – If True, geometries to be eliminated will be erased by overlapping geometries to not get double surfaces if the geometries in ‘to_eliminate’ overlaps with multiple geometries in ‘gdf’.

  • ignore_index (bool) – If False (default), the resulting GeoDataFrame will keep the index of the large polygons. If True, the resulting axis will be labeled 0, 1, …, n - 1.

  • aggfunc (str | dict | list | None) – Aggregation function(s) to use when dissolving/eliminating. Defaults to None, meaning the values of ‘gdf’ is used. Otherwise, aggfunc will be passed to pandas groupby.agg. note: The geometries of ‘gdf’ are sorted first, but if ‘gdf’ has missing values, the resulting polygons might get values from the polygons to be eliminated (if aggfunc=”first”).

  • grid_size – Rounding of the coordinates. Defaults to None.

  • n_jobs (int) – Number of threads to use. Defaults to 1.

  • **kwargs – Keyword arguments passed to the dissolve method.

Return type:

GeoDataFrame | tuple[GeoDataFrame]

Returns:

The GeoDataFrame (gdf) with the geometries of ‘to_eliminate’ dissolved in. If multiple GeoDataFrame are passed as ‘gdf’, they are returned as a tuple.

Examples:

Create two polygons with a sliver in between:

>>> sliver = sg.to_gdf(Polygon([(0, 0), (0.1, 1), (0, 2), (-0.1, 1)]))
>>> small_poly = sg.to_gdf(
...     Polygon([(0, 0), (-0.1, 1), (0, 2), (-1, 2), (-2, 2), (-1, 1)])
... )
>>> large_poly = sg.to_gdf(
...     Polygon([(0, 0), (0.1, 1), (1, 2), (2, 2), (3, 2), (3, 0)])
... )

Using multiple GeoDataFrame as input, the sliver is eliminated into the small polygon (because it has the longest border with sliver).

>>> small_poly_eliminated, large_poly_eliminated = sg.eliminate_by_longest(
...     [small_poly, large_poly], sliver
... )

With only one input GeoDataFrame:

>>> polys = pd.concat([small_poly, large_poly])
>>> eliminated = sg.eliminate_by_longest(polys, sliver)
get_gaps(gdf, include_interiors=False, grid_size=None)[source]

Get the gaps between polygons.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame of polygons.

  • include_interiors (bool) – If False (default), the holes inside individual polygons will not be included as gaps.

  • grid_size (float | int | None) – Rounding of the coordinates.

Return type:

GeoDataFrame

Note

See get_holes to find holes inside singlepart polygons.

Return type:

GeoDataFrame

Returns:

GeoDataFrame of polygons with only a geometry column.

Parameters:
  • gdf (GeoDataFrame)

  • include_interiors (bool)

  • grid_size (float | int | None)

get_holes(gdf, as_polygons=True)[source]

Get the holes inside polygons.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame of polygons.

  • as_polygons – If True (default), the holes will be returned as polygons. If False, they will be returned as LinearRings.

Return type:

GeoDataFrame

Note

See get_gaps to find holes/gaps between undissolved polygons.

Return type:

GeoDataFrame

Returns:

GeoDataFrame of polygons or linearrings with only a geometry column.

Parameters:

gdf (GeoDataFrame)

get_polygon_clusters(*gdfs, cluster_col='cluster', allow_multipart=False, predicate='intersects', as_string=False)[source]

Find which polygons overlap without dissolving.

Devides polygons into clusters in a fast and precice manner by using spatial join and networkx to find the connected components, i.e. overlapping geometries. If multiple GeoDataFrames are given, the clusters will be based on all combined.

This can be used instead of dissolve+explode, or before dissolving by the cluster column. This has been tested to be a lot faster if there are many non-overlapping polygons, but somewhat slower than dissolve+explode if most polygons overlap.

Parameters:
  • gdfs (GeoDataFrame | GeoSeries) – One or more GeoDataFrames of polygons.

  • cluster_col (str) – Name of the resulting cluster column.

  • allow_multipart (bool) – Whether to allow mutipart geometries in the gdfs. Defaults to False to avoid confusing results.

  • predicate (str | None) – Spatial predicate. Defaults to “intersects”.

  • as_string (bool) – Whether to return the cluster column values as a string with x and y coordinates. Convinient to always get unique ids. Defaults to False because of speed.

Return type:

GeoDataFrame | tuple[GeoDataFrame]

Returns:

One or more GeoDataFrames (same amount as was given) with a new cluster column.

Examples:

Create geometries with three clusters of overlapping polygons.

>>> import sgis as sg
>>> gdf = sg.to_gdf([(0, 0), (1, 1), (0, 1), (4, 4), (4, 3), (7, 7)])
>>> buffered = sg.buff(gdf, 1)
>>> gdf
                                            geometry
0  POLYGON ((1.00000 0.00000, 0.99951 -0.03141, 0...
1  POLYGON ((2.00000 1.00000, 1.99951 0.96859, 1....
2  POLYGON ((1.00000 1.00000, 0.99951 0.96859, 0....
3  POLYGON ((5.00000 4.00000, 4.99951 3.96859, 4....
4  POLYGON ((5.00000 3.00000, 4.99951 2.96859, 4....
5  POLYGON ((8.00000 7.00000, 7.99951 6.96859, 7....

Add a cluster column to the GeoDataFrame:

>>> gdf = sg.get_polygon_clusters(gdf, cluster_col="cluster")
>>> gdf
   cluster                                           geometry
0        0  POLYGON ((1.00000 0.00000, 0.99951 -0.03141, 0...
1        0  POLYGON ((2.00000 1.00000, 1.99951 0.96859, 1....
2        0  POLYGON ((1.00000 1.00000, 0.99951 0.96859, 0....
3        1  POLYGON ((5.00000 4.00000, 4.99951 3.96859, 4....
4        1  POLYGON ((5.00000 3.00000, 4.99951 2.96859, 4....
5        2  POLYGON ((8.00000 7.00000, 7.99951 6.96859, 7....

If multiple GeoDataFrames are given, all are returned with common cluster values.

>>> gdf2 = sg.to_gdf([(0, 0), (7, 7)])
>>> gdf, gdf2 = sg.get_polygon_clusters(gdf, gdf2, cluster_col="cluster")
>>> gdf2
   cluster                 geometry
0        0  POINT (0.00000 0.00000)
1        2  POINT (7.00000 7.00000)
>>> gdf
   cluster                                           geometry
0        0  POLYGON ((1.00000 0.00000, 0.99951 -0.03141, 0...
1        0  POLYGON ((2.00000 1.00000, 1.99951 0.96859, 1....
2        0  POLYGON ((1.00000 1.00000, 0.99951 0.96859, 0....
3        1  POLYGON ((5.00000 4.00000, 4.99951 3.96859, 4....
4        1  POLYGON ((5.00000 3.00000, 4.99951 2.96859, 4....
5        2  POLYGON ((8.00000 7.00000, 7.99951 6.96859, 7....

Dissolving ‘by’ the cluster column will make the dissolve much faster if there are a lot of non-overlapping polygons.

>>> dissolved = gdf.dissolve(by="cluster", as_index=False)
>>> dissolved
   cluster                                           geometry
0        0  POLYGON ((0.99951 -0.03141, 0.99803 -0.06279, ...
1        1  POLYGON ((4.99951 2.96859, 4.99803 2.93721, 4....
2        2  POLYGON ((8.00000 7.00000, 7.99951 6.96859, 7....