General geopandas tools

clean_clip(gdf, mask, keep_geom_type=None, geom_type=None, **kwargs)[source]

Clips and clean geometries.

Geopandas.clip does a “fast and dirty clipping, with no guarantee for valid outputs”. Here, the clipped geometries are made valid, and empty and NaN geometries are removed.

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries to be clipped

  • mask (GeoDataFrame | GeoSeries | Geometry) – the geometry to clip gdf

  • geom_type (str | None) – Optionally specify what geometry type to keep., if there are mixed geometry types. Must be either “polygon”, “line” or “point”.

  • keep_geom_type (bool | None) – Defaults to None, meaning True if ‘geom_type’ is given and True if the geometries are single-typed and False if the geometries are mixed.

  • **kwargs – Keyword arguments passed to geopandas.GeoDataFrame.clip

Return type:

GeoDataFrame | GeoSeries

Returns:

The cleanly clipped GeoDataFrame.

Raises:

TypeError – If gdf is not of type GeoDataFrame or GeoSeries.

clean_geoms(gdf, ignore_index=False)[source]

Fixes geometries, then removes empty, NaN and None geometries.

Parameters:
  • gdf (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries to be cleaned.

  • ignore_index (bool) – If True, the resulting axis will be labeled 0, 1, …, n - 1. Defaults to False

Return type:

GeoDataFrame | GeoSeries

Returns:

GeoDataFrame or GeoSeries with fixed geometries and only the rows with valid, non-empty and not-NaN/-None geometries.

Examples:

>>> import sgis as sg
>>> import pandas as pd
>>> from shapely import wkt
>>> gdf = sg.to_gdf([
...         "POINT (0 0)",
...         "LINESTRING (1 1, 2 2)",
...         "POLYGON ((3 3, 4 4, 3 4, 3 3))"
...         ])
>>> gdf
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....

Add None and empty geometries.

>>> missing = pd.DataFrame({"geometry": [None]})
>>> empty = sg.to_gdf(wkt.loads("POINT (0 0)").buffer(0))
>>> gdf = pd.concat([gdf, missing, empty])
>>> gdf
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
0                                               None
0                                      POLYGON EMPTY

Clean.

>>> sg.clean_geoms(gdf)
                                            geometry
0                            POINT (0.00000 0.00000)
1      LINESTRING (1.00000 1.00000, 2.00000 2.00000)
2  POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
drop_inactive_geometry_columns(gdf)[source]

Removes geometry columns in a GeoDataFrame if they are not active.

Return type:

GeoDataFrame

Parameters:

gdf (GeoDataFrame)

get_common_crs(iterable, strict=False)[source]

Returns the common not-None crs or raises a ValueError if more than one.

Parameters:
  • iterable (Iterable[Hashable]) – Iterable of objects with the attribute “crs” or a list of CRS-like (pyproj.CRS-accepted) objects.

  • strict (bool) – If False (default), falsy CRS-es will be ignored and None will be returned if all CRS-es are falsy. If strict is True,

Return type:

CRS | None

Returns:

pyproj.CRS object or None (if all crs are None).

Raises:
  • ValueError if there are more than one crs. If strict is True,

  • None is included.

get_grouped_centroids(gdf, groupby, as_string=True)[source]

Get the centerpoint of the geometries within a group.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame.

  • groupby (str | list[str]) – column to group by.

  • as_string (bool) – If True (default), coordinates are returned in the format “{x}_{y}”. If False, coordinates are returned as Points.

Return type:

Series

Returns:

A pandas.Series of grouped centroids with the index of ‘gdf’.

get_index_right_columns(gdf)[source]

Get a list of what will be the resulting columns in an sjoin.

Return type:

list[str]

Parameters:

gdf (DataFrame | Series)

make_edge_coords_cols(gdf)[source]

Get the wkt of the first and last points of lines as columns.

It takes a GeoDataFrame of LineStrings and returns a GeoDataFrame with two new columns, source_coords and target_coords, which are the x and y coordinates of the first and last points of the LineStrings in a tuple. The lines all have to be

Parameters:

gdf (GeoDataFrame) – the GeoDataFrame with the lines

Return type:

GeoDataFrame

Returns:

A GeoDataFrame with new columns ‘source_coords’ and ‘target_coords’

make_edge_wkt_cols(gdf)[source]

Get coordinate tuples of the first and last points of lines as columns.

It takes a GeoDataFrame of LineStrings and returns a GeoDataFrame with two new columns, source_wkt and target_wkt, which are the WKT representations of the first and last points of the LineStrings

Parameters:

gdf (GeoDataFrame) – the GeoDataFrame with the lines

Return type:

GeoDataFrame

Returns:

A GeoDataFrame with new columns ‘source_wkt’ and ‘target_wkt’

make_lines_between_points(*arrs)[source]

Creates an array of linestrings from two or more arrays of points.

The lines are created rowwise, meaning from arr0[0] to arr1[0], from arr0[1] to arr1[1]… If more than two arrays are passed, e.g. three arrays, the lines will go from arr0[0] via arr1[0] to arr2[0].

Parameters:

arrs (ndarray[Any, dtype[Point]] | GeometryArray | GeoSeries) – 1 dimensional arrays of point geometries. All arrays must have the same shape. Must be at least two arrays.

Return type:

ndarray[Any, dtype[LineString]]

Returns:

A numpy array of linestrings.

points_in_bounds(gdf, gridsize)[source]

Get a GeoDataFrame of points within the bounds of the GeoDataFrame.

Return type:

GeoDataFrame

Parameters:
  • gdf (GeoDataFrame | GeoSeries)

  • gridsize (int | float)

random_points(n, loc=0.5)[source]

Creates a GeoDataFrame with n random points.

Parameters:
  • n (int) – Number of points/rows to create.

  • loc (float | int) – Mean (‘centre’) of the distribution.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame of points with n rows.

Examples:

>>> import sgis as sg
>>> points = sg.random_points(10_000)
>>> points
                     geometry
0     POINT (0.62044 0.22805)
1     POINT (0.31885 0.38109)
2     POINT (0.39632 0.61130)
3     POINT (0.99401 0.35732)
4     POINT (0.76403 0.73539)
...                       ...
9995  POINT (0.90433 0.75080)
9996  POINT (0.10959 0.59785)
9997  POINT (0.00330 0.79168)
9998  POINT (0.90926 0.96215)
9999  POINT (0.01386 0.22935)
[10000 rows x 1 columns]

Values with a mean of 100.

>>> points = sg.random_points(10_000, loc=100)
>>> points
                     geometry
0      POINT (50.442 199.729)
1       POINT (26.450 83.367)
2     POINT (111.054 147.610)
3      POINT (93.141 141.456)
4       POINT (94.101 24.837)
...                       ...
9995   POINT (174.344 91.772)
9996    POINT (95.375 11.391)
9997    POINT (45.694 60.843)
9998   POINT (73.261 101.881)
9999  POINT (134.503 168.155)
[10000 rows x 1 columns]
random_points_in_polygons(gdf, n, seed=None)[source]

Creates a GeoDataFrame with n random points within the geometries of ‘gdf’.

Parameters:
  • gdf (GeoDataFrame) – A GeoDataFrame.

  • n (int) – Number of points/rows to create.

  • seed – Optional random seet.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame of points with n rows.

sort_large_first(gdf)[source]

Sort GeoDataFrame by area in decending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from large to small in area.

Examples:

Create GeoDataFrame with NaN values.

>>> import sgis as sg
>>> df = sg.to_gdf(
...     [
...         (0, 1),
...         (1, 0),
...         (1, 1),
...         (0, 0),
...         (0.5, 0.5),
...     ]
... )
>>> df.geometry = df.buffer([4, 1, 2, 3, 5])
>>> df["col"] = [None, 1, 2, None, 1]
>>> df["col2"] = [None, 1, 2, 3, None]
>>> df["area"] = df.area
>>> df
                                            geometry  col  col2       area
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
>>> sg.sort_large_first(df)
                                            geometry  col  col2       area
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
>>> sg.sort_nans_last(sg.sort_large_first(df))
                                            geometry  col  col2       area
2  POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2....  2.0   2.0  12.546194
1  POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1....  1.0   1.0   3.136548
4  POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5...  1.0   NaN  78.413712
3  POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3....  NaN   3.0  28.228936
0  POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4....  NaN   NaN  50.184776
sort_long_first(gdf)[source]

Sort GeoDataFrame by length in decending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from long to short in length.

sort_short_first(gdf)[source]

Sort GeoDataFrame by length in ascending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from short to long in length.

sort_small_first(gdf)[source]

Sort GeoDataFrame by area in ascending order.

Parameters:

gdf (GeoDataFrame | GeoSeries) – A GeoDataFrame or GeoSeries.

Return type:

GeoDataFrame | GeoSeries

Returns:

A GeoDataFrame or GeoSeries sorted from small to large in area.

to_lines(*gdfs, copy=True, split=True)[source]

Makes lines out of one or more GeoDataFrames and splits them at intersections.

The GeoDataFrames’ geometries are converted to LineStrings, then unioned together and made to singlepart. The lines are split at the intersections. Mimics ‘feature to line’ in ArcGIS.

Parameters:
  • *gdfs (GeoDataFrame) – one or more GeoDataFrames.

  • copy (bool) – whether to take a copy of the incoming GeoDataFrames. Defaults to True.

  • split (bool) – If True (default), lines will be split at intersections if more than one GeoDataFrame is passed as gdfs. Otherwise, a simple concat.

Return type:

GeoDataFrame

Returns:

A GeoDataFrame with singlepart line geometries and columns of all input

GeoDataFrames.

Note

The index is preserved if only one GeoDataFrame is given, but otherwise ignored. This is because the union overlay used if multiple GeoDataFrames always ignores the index.

Examples:

Convert single polygon to linestring.

>>> import sgis as sg
>>> from shapely.geometry import Polygon
>>> poly1 = sg.to_gdf(Polygon([(0, 0), (0, 1), (1, 1), (1, 0)]))
>>> poly1["poly1"] = 1
>>> line = sg.to_lines(poly1)
>>> line
                                            geometry  poly1
0  LINESTRING (0.00000 0.00000, 0.00000 1.00000, ...      1

Convert two overlapping polygons to linestrings.

>>> poly2 = sg.to_gdf(Polygon([(0.5, 0.5), (0.5, 1.5), (1.5, 1.5), (1.5, 0.5)]))
>>> poly2["poly2"] = 1
>>> lines = sg.to_lines(poly1, poly2)
>>> lines
poly1  poly2                                           geometry
0    1.0    NaN  LINESTRING (0.00000 0.00000, 0.00000 1.00000, ...
1    1.0    NaN  LINESTRING (0.50000 1.00000, 1.00000 1.00000, ...
2    1.0    NaN  LINESTRING (1.00000 0.50000, 1.00000 0.00000, ...
3    NaN    1.0      LINESTRING (0.50000 0.50000, 0.50000 1.00000)
4    NaN    1.0  LINESTRING (0.50000 1.00000, 0.50000 1.50000, ...
5    NaN    1.0      LINESTRING (1.00000 0.50000, 0.50000 0.50000)

Plot before and after.

>>> sg.qtm(poly1, poly2)
>>> lines["l"] = lines.length
>>> sg.qtm(lines, "l")