General geopandas tools¶
- clean_clip(gdf, mask, keep_geom_type=None, geom_type=None, **kwargs)[source]¶
Clips and clean geometries.
Geopandas.clip does a “fast and dirty clipping, with no guarantee for valid outputs”. Here, the clipped geometries are made valid, and empty and NaN geometries are removed.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – GeoDataFrame or GeoSeries to be clippedmask (
GeoDataFrame
|GeoSeries
|Geometry
) – the geometry to clip gdfgeom_type (
str
|None
) – Optionally specify what geometry type to keep., if there are mixed geometry types. Must be either “polygon”, “line” or “point”.keep_geom_type (
bool
|None
) – Defaults to None, meaning True if ‘geom_type’ is given and True if the geometries are single-typed and False if the geometries are mixed.**kwargs – Keyword arguments passed to geopandas.GeoDataFrame.clip
- Return type:
GeoDataFrame
|GeoSeries
- Returns:
The cleanly clipped GeoDataFrame.
- Raises:
TypeError – If gdf is not of type GeoDataFrame or GeoSeries.
- clean_geoms(gdf, ignore_index=False)[source]¶
Fixes geometries, then removes empty, NaN and None geometries.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – GeoDataFrame or GeoSeries to be cleaned.ignore_index (
bool
) – If True, the resulting axis will be labeled 0, 1, …, n - 1. Defaults to False
- Return type:
GeoDataFrame
|GeoSeries
- Returns:
GeoDataFrame or GeoSeries with fixed geometries and only the rows with valid, non-empty and not-NaN/-None geometries.
Examples:¶
>>> import sgis as sg >>> import pandas as pd >>> from shapely import wkt >>> gdf = sg.to_gdf([ ... "POINT (0 0)", ... "LINESTRING (1 1, 2 2)", ... "POLYGON ((3 3, 4 4, 3 4, 3 3))" ... ]) >>> gdf geometry 0 POINT (0.00000 0.00000) 1 LINESTRING (1.00000 1.00000, 2.00000 2.00000) 2 POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
Add None and empty geometries.
>>> missing = pd.DataFrame({"geometry": [None]}) >>> empty = sg.to_gdf(wkt.loads("POINT (0 0)").buffer(0)) >>> gdf = pd.concat([gdf, missing, empty]) >>> gdf geometry 0 POINT (0.00000 0.00000) 1 LINESTRING (1.00000 1.00000, 2.00000 2.00000) 2 POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3.... 0 None 0 POLYGON EMPTY
Clean.
>>> sg.clean_geoms(gdf) geometry 0 POINT (0.00000 0.00000) 1 LINESTRING (1.00000 1.00000, 2.00000 2.00000) 2 POLYGON ((3.00000 3.00000, 4.00000 4.00000, 3....
- drop_inactive_geometry_columns(gdf)[source]¶
Removes geometry columns in a GeoDataFrame if they are not active.
- Return type:
GeoDataFrame
- Parameters:
gdf (GeoDataFrame)
- get_common_crs(iterable, strict=False)[source]¶
Returns the common not-None crs or raises a ValueError if more than one.
- Parameters:
iterable (
Iterable
[Hashable
]) – Iterable of objects with the attribute “crs” or a list of CRS-like (pyproj.CRS-accepted) objects.strict (
bool
) – If False (default), falsy CRS-es will be ignored and None will be returned if all CRS-es are falsy. If strict is True,
- Return type:
CRS
|None
- Returns:
pyproj.CRS object or None (if all crs are None).
- Raises:
ValueError if there are more than one crs. If strict is True, –
None is included. –
- get_grouped_centroids(gdf, groupby, as_string=True)[source]¶
Get the centerpoint of the geometries within a group.
- Parameters:
gdf (
GeoDataFrame
) – GeoDataFrame.groupby (
str
|list
[str
]) – column to group by.as_string (
bool
) – If True (default), coordinates are returned in the format “{x}_{y}”. If False, coordinates are returned as Points.
- Return type:
Series
- Returns:
A pandas.Series of grouped centroids with the index of ‘gdf’.
- get_index_right_columns(gdf)[source]¶
Get a list of what will be the resulting columns in an sjoin.
- Return type:
list
[str
]- Parameters:
gdf (DataFrame | Series)
- make_edge_coords_cols(gdf)[source]¶
Get the wkt of the first and last points of lines as columns.
It takes a GeoDataFrame of LineStrings and returns a GeoDataFrame with two new columns, source_coords and target_coords, which are the x and y coordinates of the first and last points of the LineStrings in a tuple. The lines all have to be
- Parameters:
gdf (GeoDataFrame) – the GeoDataFrame with the lines
- Return type:
GeoDataFrame
- Returns:
A GeoDataFrame with new columns ‘source_coords’ and ‘target_coords’
- make_edge_wkt_cols(gdf)[source]¶
Get coordinate tuples of the first and last points of lines as columns.
It takes a GeoDataFrame of LineStrings and returns a GeoDataFrame with two new columns, source_wkt and target_wkt, which are the WKT representations of the first and last points of the LineStrings
- Parameters:
gdf (GeoDataFrame) – the GeoDataFrame with the lines
- Return type:
GeoDataFrame
- Returns:
A GeoDataFrame with new columns ‘source_wkt’ and ‘target_wkt’
- make_lines_between_points(*arrs)[source]¶
Creates an array of linestrings from two or more arrays of points.
The lines are created rowwise, meaning from arr0[0] to arr1[0], from arr0[1] to arr1[1]… If more than two arrays are passed, e.g. three arrays, the lines will go from arr0[0] via arr1[0] to arr2[0].
- Parameters:
arrs (
ndarray
[Any
,dtype
[Point
]] |GeometryArray
|GeoSeries
) – 1 dimensional arrays of point geometries. All arrays must have the same shape. Must be at least two arrays.- Return type:
ndarray
[Any
,dtype
[LineString
]]- Returns:
A numpy array of linestrings.
- points_in_bounds(gdf, gridsize)[source]¶
Get a GeoDataFrame of points within the bounds of the GeoDataFrame.
- Return type:
GeoDataFrame
- Parameters:
gdf (GeoDataFrame | GeoSeries)
gridsize (int | float)
- random_points(n, loc=0.5)[source]¶
Creates a GeoDataFrame with n random points.
- Parameters:
n (
int
) – Number of points/rows to create.loc (
float
|int
) – Mean (‘centre’) of the distribution.
- Return type:
GeoDataFrame
- Returns:
A GeoDataFrame of points with n rows.
Examples:¶
>>> import sgis as sg >>> points = sg.random_points(10_000) >>> points geometry 0 POINT (0.62044 0.22805) 1 POINT (0.31885 0.38109) 2 POINT (0.39632 0.61130) 3 POINT (0.99401 0.35732) 4 POINT (0.76403 0.73539) ... ... 9995 POINT (0.90433 0.75080) 9996 POINT (0.10959 0.59785) 9997 POINT (0.00330 0.79168) 9998 POINT (0.90926 0.96215) 9999 POINT (0.01386 0.22935) [10000 rows x 1 columns]
Values with a mean of 100.
>>> points = sg.random_points(10_000, loc=100) >>> points geometry 0 POINT (50.442 199.729) 1 POINT (26.450 83.367) 2 POINT (111.054 147.610) 3 POINT (93.141 141.456) 4 POINT (94.101 24.837) ... ... 9995 POINT (174.344 91.772) 9996 POINT (95.375 11.391) 9997 POINT (45.694 60.843) 9998 POINT (73.261 101.881) 9999 POINT (134.503 168.155) [10000 rows x 1 columns]
- random_points_in_polygons(gdf, n, seed=None)[source]¶
Creates a GeoDataFrame with n random points within the geometries of ‘gdf’.
- Parameters:
gdf (
GeoDataFrame
) – A GeoDataFrame.n (
int
) – Number of points/rows to create.seed – Optional random seet.
- Return type:
GeoDataFrame
- Returns:
A GeoDataFrame of points with n rows.
- sort_large_first(gdf)[source]¶
Sort GeoDataFrame by area in decending order.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – A GeoDataFrame or GeoSeries.- Return type:
GeoDataFrame
|GeoSeries
- Returns:
A GeoDataFrame or GeoSeries sorted from large to small in area.
Examples:¶
Create GeoDataFrame with NaN values.
>>> import sgis as sg >>> df = sg.to_gdf( ... [ ... (0, 1), ... (1, 0), ... (1, 1), ... (0, 0), ... (0.5, 0.5), ... ] ... ) >>> df.geometry = df.buffer([4, 1, 2, 3, 5]) >>> df["col"] = [None, 1, 2, None, 1] >>> df["col2"] = [None, 1, 2, 3, None] >>> df["area"] = df.area >>> df geometry col col2 area 0 POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4.... NaN NaN 50.184776 1 POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1.... 1.0 1.0 3.136548 2 POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2.... 2.0 2.0 12.546194 3 POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3.... NaN 3.0 28.228936 4 POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5... 1.0 NaN 78.413712
>>> sg.sort_large_first(df) geometry col col2 area 4 POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5... 1.0 NaN 78.413712 0 POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4.... NaN NaN 50.184776 3 POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3.... NaN 3.0 28.228936 2 POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2.... 2.0 2.0 12.546194 1 POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1.... 1.0 1.0 3.136548
>>> sg.sort_nans_last(sg.sort_large_first(df)) geometry col col2 area 2 POLYGON ((2.33302 0.49287, 2.32339 0.29683, 2.... 2.0 2.0 12.546194 1 POLYGON ((1.40111 0.71798, 1.39630 0.61996, 1.... 1.0 1.0 3.136548 4 POLYGON ((5.63590 0.16005, 5.61182 -0.33004, 5... 1.0 NaN 78.413712 3 POLYGON ((3.68381 0.46299, 3.66936 0.16894, 3.... NaN 3.0 28.228936 0 POLYGON ((4.56136 0.53436, 4.54210 0.14229, 4.... NaN NaN 50.184776
- sort_long_first(gdf)[source]¶
Sort GeoDataFrame by length in decending order.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – A GeoDataFrame or GeoSeries.- Return type:
GeoDataFrame
|GeoSeries
- Returns:
A GeoDataFrame or GeoSeries sorted from long to short in length.
- sort_short_first(gdf)[source]¶
Sort GeoDataFrame by length in ascending order.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – A GeoDataFrame or GeoSeries.- Return type:
GeoDataFrame
|GeoSeries
- Returns:
A GeoDataFrame or GeoSeries sorted from short to long in length.
- sort_small_first(gdf)[source]¶
Sort GeoDataFrame by area in ascending order.
- Parameters:
gdf (
GeoDataFrame
|GeoSeries
) – A GeoDataFrame or GeoSeries.- Return type:
GeoDataFrame
|GeoSeries
- Returns:
A GeoDataFrame or GeoSeries sorted from small to large in area.
- to_lines(*gdfs, copy=True, split=True)[source]¶
Makes lines out of one or more GeoDataFrames and splits them at intersections.
The GeoDataFrames’ geometries are converted to LineStrings, then unioned together and made to singlepart. The lines are split at the intersections. Mimics ‘feature to line’ in ArcGIS.
- Parameters:
*gdfs (
GeoDataFrame
) – one or more GeoDataFrames.copy (
bool
) – whether to take a copy of the incoming GeoDataFrames. Defaults to True.split (
bool
) – If True (default), lines will be split at intersections if more than one GeoDataFrame is passed as gdfs. Otherwise, a simple concat.
- Return type:
GeoDataFrame
- Returns:
- A GeoDataFrame with singlepart line geometries and columns of all input
GeoDataFrames.
Note
The index is preserved if only one GeoDataFrame is given, but otherwise ignored. This is because the union overlay used if multiple GeoDataFrames always ignores the index.
Examples:¶
Convert single polygon to linestring.
>>> import sgis as sg >>> from shapely.geometry import Polygon >>> poly1 = sg.to_gdf(Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])) >>> poly1["poly1"] = 1 >>> line = sg.to_lines(poly1) >>> line geometry poly1 0 LINESTRING (0.00000 0.00000, 0.00000 1.00000, ... 1
Convert two overlapping polygons to linestrings.
>>> poly2 = sg.to_gdf(Polygon([(0.5, 0.5), (0.5, 1.5), (1.5, 1.5), (1.5, 0.5)])) >>> poly2["poly2"] = 1 >>> lines = sg.to_lines(poly1, poly2) >>> lines poly1 poly2 geometry 0 1.0 NaN LINESTRING (0.00000 0.00000, 0.00000 1.00000, ... 1 1.0 NaN LINESTRING (0.50000 1.00000, 1.00000 1.00000, ... 2 1.0 NaN LINESTRING (1.00000 0.50000, 1.00000 0.00000, ... 3 NaN 1.0 LINESTRING (0.50000 0.50000, 0.50000 1.00000) 4 NaN 1.0 LINESTRING (0.50000 1.00000, 0.50000 1.50000, ... 5 NaN 1.0 LINESTRING (1.00000 0.50000, 0.50000 0.50000)
Plot before and after.
>>> sg.qtm(poly1, poly2) >>> lines["l"] = lines.length >>> sg.qtm(lines, "l")