geebeam.sampler

Helper for sampling locations across regions of interest

Functions

`sample_region_random`(roi, crs, n_sample[, ...])	Get random points within region of interest.
`sample_region_grid`(roi, crs, stride, scale[, ...])	Get a regular grid of points covering region of interest
`split_sets`(points_gdf, split_names[, split_ratios, ...])	Assign sampling points to different splits (e.g., train, validation, test).

Module Contents

geebeam.sampler.sample_region_random(roi, crs, n_sample, random_seed=0, buffer_distance=0)

Get random points within region of interest.

Parameters:

roi (geopandas.GeoDataFrame)
crs (str)
n_sample (int)
random_seed (int)
buffer_distance (float)

Return type:

geopandas.GeoDataFrame

geebeam.sampler.sample_region_grid(roi, crs, stride, scale, buffer_distance=0)

Get a regular grid of points covering region of interest

Parameters:

roi (geopandas.GeoDataFrame)
crs (str)
stride (int)
scale (float)
buffer_distance (float)

Return type:

geopandas.GeoDataFrame

geebeam.sampler.split_sets(points_gdf, split_names, split_ratios=None, split_counts=None, random_seed=0, shuffle=True)

Assign sampling points to different splits (e.g., train, validation, test).

Divides a collection of sampling points into named splits with specified proportions or counts. Supports optional shuffling before assignment to ensure random distribution across splits.

Parameters:

points_gdf (geopandas.GeoDataFrame | pandas.DataFrame | ee.FeatureCollection) – Collection of sampling points in one of the following formats: - gpd.GeoDataFrame: Point geometries with CRS information - pd.DataFrame: Must contain ‘x’ and ‘y’ coordinate columns - ee.FeatureCollection: Earth Engine FeatureCollection of points
split_names (list[str]) – List of names for each split (e.g., [‘train’, ‘validation’, ‘test’]).
split_ratios (list[float] | None) – List of floats specifying the proportion of points for each split. Must sum to 1.0 and match the length of split_names. Either this or split_counts must be provided.
split_counts (list[int] | None) – List of integers specifying the exact number of points for each split. Must sum to the total number of points and match the length of split_names. Either this or split_ratios must be provided.
random_seed (int) – Seed for random number generation. Ensures reproducible splits when shuffle=True. Defaults to 0.
shuffle (bool) – Whether to randomly shuffle points before assigning to splits. Defaults to True.

Returns:

GeoDataFrame or FeatureCollection with a ‘split’ column containing the assigned split name for each point. Also includes an ‘id’ column with point identifiers if not already present.

Raises:

ValueError – If split_ratios do not sum to 1.0, or if split_counts do not sum to total observations, or if the lengths of split_names, split_ratios, and/or split_counts do not match. Also raised if neither split_ratios nor split_counts is provided.

Return type:

geopandas.GeoDataFrame