General geopandas tools

clean_clip(gdf, mask, keep_geom_type=None, geom_type=None, **kwargs)[source]

Clips and clean geometries.

Geopandas.clip does a “fast and dirty clipping, with no guarantee for valid outputs”. Here, the clipped geometries are made valid, and empty and NaN geometries are removed.

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries to be clipped

  • mask (GeoDataFrame | GeoSeries | Geometry) – the geometry to clip gdf

  • geom_type (str | None) – Optionally specify what geometry type to keep., if there are mixed geometry types. Must be either “polygon”, “line” or “point”.

  • keep_geom_type (bool | None) – Defaults to None, meaning True if ‘geom_type’ is given and True if the geometries are single-typed and False if the geometries are mixed.

  • **kwargs – Keyword arguments passed to geopandas.GeoDataFrame.clip

Return type:

GeoDataFrame | GeoSeries

Returns:

The cleanly clipped GeoDataFrame.

Raises:

TypeError – If gdf is not of type GeoDataFrame or GeoSeries.

clean_geoms(gdf, ignore_index=False)[source]

Fixes geometries, then removes empty, NaN and None geometries.

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries to be cleaned.

  • ignore_index (bool) – If True, the resulting axis will be labeled 0, 1, …, n - 1. Defaults to False

Return type:

GeoDataFrame | GeoSeries

Returns:

GeoDataFrame or GeoSeries with fixed geometries and only the rows with valid, non-empty and not-NaN/-None geometries.

Examples:

>>> import sgis as sg
>>> import pandas as pd
>>> from shapely import wkt
>>> gdf = sg.to_gdf([
...         "POINT (0 0)",
...         "LINESTRING (1 1, 2 2)",
...         "POLYGON ((3 3, 4 4, 3 4, 3 3))"
...         ])
>>> gdf
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....

Add None and empty geometries.

>>> missing = pd.DataFrame({"geometry": [None]})
>>> empty = sg.to_gdf(wkt.loads("POINT (0 0)").buffer(0))
>>> gdf = pd.concat([gdf, missing, empty])
>>> gdf
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
0                                               None
0                                      POLYGON EMPTY

Clean.

>>> sg.clean_geoms(gdf)
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
drop_inactive_geometry_columns(gdf)[source]

Removes geometry columns in a GeoDataFrame if they are not active.

Return type:

GeoDataFrame

Parameters:

gdf (GeoDataFrame)

get_common_crs(iterable, strict=False)[source]

Returns the common not-None crs or raises a ValueError if more than one.

Parameters:
  • iterable (Iterable[Hashable]) – Iterable of objects with the attribute “crs” or a list of CRS-like (pyproj.CRS-accepted) objects.

  • strict (bool) – If False (default), falsy CRS-es will be ignored and None will be returned if all CRS-es are falsy. If strict is True,

Return type:

CRS | None

Returns:

pyproj.CRS object or None (if all crs are None).

Raises:
  • ValueError if there are more than one crs. If strict is True,

  • None is included.

get_grouped_centroids(gdf, groupby, as_string=True)[source]

Get the centerpoint of the geometries within a group.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame.

  • groupby (str | list[str]) – column to group by.

  • as_string (bool) – If True (default), coordinates are returned in the format “{x}_{y}”. If False, coordinates are returned as Points.

Return type:

Series

Returns:

A pandas.Series of grouped centroids with the index of ‘gdf’.

make_lines_between_points(arr1, arr2)[source]

Creates an array of linestrings from two arrays of points.

The operation is done rowwise.

Parameters:
  • arr1 (ndarray[Any, dtype[Point]] | GeometryArray | GeoSeries) – GeometryArray og GeoSeries of points.

  • arr2 (ndarray[Any, dtype[Point]] | GeometryArray | GeoSeries) – GeometryArray og GeoSeries of points of same length as arr1.

Return type:

ndarray[Any, dtype[LineString]]

Returns:

A numpy array of linestrings.

Raises:

ValueError – If the arrays have unequal shape.

random_points(n, loc=0.5)[source]

Creates a GeoDataFrame with n random points.

Parameters:
  • n (int) – Number of points/rows to create.

  • loc (float | int) – Mean (‘centre’) of the distribution.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame of points with n rows.

Examples:

>>> import sgis as sg
>>> points = sg.random_points(10_000)
>>> points
                     geometry
0     POINT (0.62044 0.22805)
1     POINT (0.31885 0.38109)
2     POINT (0.39632 0.61130)
3     POINT (0.99401 0.35732)
4     POINT (0.76403 0.73539)
...                       ...
9995  POINT (0.90433 0.75080)
9996  POINT (0.10959 0.59785)
9997  POINT (0.00330 0.79168)
9998  POINT (0.90926 0.96215)
9999  POINT (0.01386 0.22935)
[10000 rows x 1 columns]

Values with a mean of 100.

>>> points = sg.random_points(10_000, loc=100)
>>> points
                     geometry
0      POINT (50.442 199.729)
1       POINT (26.450 83.367)
2     POINT (111.054 147.610)
3      POINT (93.141 141.456)
4       POINT (94.101 24.837)
...                       ...
9995   POINT (174.344 91.772)
9996    POINT (95.375 11.391)
9997    POINT (45.694 60.843)
9998   POINT (73.261 101.881)
9999  POINT (134.503 168.155)
[10000 rows x 1 columns]
random_points_in_polygons(gdf, n, seed=None)[source]

Creates a GeoDataFrame with n random points within the geometries of ‘gdf’.

Parameters:
  • gdf (GeoDataFrame) – A GeoDataFrame.

  • n (int) – Number of points/rows to create.

  • seed – Optional random seet.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame of points with n rows.

sort_large_first(gdf)[source]

Sort GeoDataFrame by area in decending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from large to small in area.

Examples:

Create GeoDataFrame with NaN values.

>>> import sgis as sg
>>> df = sg.to_gdf(
...     [
...         (0, 1),
...         (1, 0),
...         (1, 1),
...         (0, 0),
...         (0.5, 0.5),
...     ]
... )
>>> df.geometry = df.buffer([4, 1, 2, 3, 5])
>>> df["col"] = [None, 1, 2, None, 1]
>>> df["col2"] = [None, 1, 2, 3, None]
>>> df["area"] = df.area
>>> df
                                            geometry  col  col2       area
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
>>> sg.sort_large_first(df)
                                            geometry  col  col2       area
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
>>> sg.sort_nans_last(sg.sort_large_first(df))
                                            geometry  col  col2       area
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
sort_long_first(gdf)[source]

Sort GeoDataFrame by length in decending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from long to short in length.

sort_short_first(gdf)[source]

Sort GeoDataFrame by length in ascending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from short to long in length.

sort_small_first(gdf)[source]

Sort GeoDataFrame by area in ascending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from small to large in area.

to_lines(*gdfs, copy=True)[source]

Makes lines out of one or more GeoDataFrames and splits them at intersections.

The GeoDataFrames’ geometries are converted to LineStrings, then unioned together and made to singlepart. The lines are split at the intersections. Mimics ‘feature to line’ in ArcGIS.

Parameters:
  • *gdfs (GeoDataFrame) – one or more GeoDataFrames.

  • copy (bool) – whether to take a copy of the incoming GeoDataFrames. Defaults to True.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame with singlepart line geometries and columns of all input

GeoDataFrames.

Note

The index is preserved if only one GeoDataFrame is given, but otherwise ignored. This is because the union overlay used if multiple GeoDataFrames always ignores the index.

Examples:

Convert single polygon to linestring.

>>> import sgis as sg
>>> from shapely.geometry import Polygon
>>> poly1 = sg.to_gdf(Polygon([(0, 0), (0, 1), (1, 1), (1, 0)]))
>>> poly1["poly1"] = 1
>>> line = sg.to_lines(poly1)
>>> line
                                            geometry  poly1
0  LINESTRING (0.00000 0.00000, 0.00000 1.00000, ...      1

Convert two overlapping polygons to linestrings.

>>> poly2 = sg.to_gdf(Polygon([(0.5, 0.5), (0.5, 1.5), (1.5, 1.5), (1.5, 0.5)]))
>>> poly2["poly2"] = 1
>>> lines = sg.to_lines(poly1, poly2)
>>> lines
poly1  poly2                                           geometry
0    1.0    NaN  LINESTRING (0.00000 0.00000, 0.00000 1.00000, ...
1    1.0    NaN  LINESTRING (0.50000 1.00000, 1.00000 1.00000, ...
2    1.0    NaN  LINESTRING (1.00000 0.50000, 1.00000 0.00000, ...
3    NaN    1.0      LINESTRING (0.50000 0.50000, 0.50000 1.00000)
4    NaN    1.0  LINESTRING (0.50000 1.00000, 0.50000 1.50000, ...
5    NaN    1.0      LINESTRING (1.00000 0.50000, 0.50000 0.50000)

Plot before and after.

>>> sg.qtm(poly1, poly2)
>>> lines["l"] = lines.length
>>> sg.qtm(lines, "l")