geebeam.pipeline

Prepare and run Beam pipeline to download image ‘chips’ from Earth Engine

Functions

`run_pipeline`(image_list, output_path, project, ...[, ...])	Run a Beam pipeline to download image chips from Earth Engine.
`sample_and_run_pipeline`(image_list, sampling_region, ...)	Sample random points and then run a Beam pipeline to download image chips from Earth Engine.
`grid_and_run_pipeline`(image_list, sampling_region, ...)	Sample points from regular grid and then run a Beam pipeline to download image chips from Earth Engine.

Module Contents

geebeam.pipeline.run_pipeline(image_list, output_path, project, patch_size, scale, sampling_points, output_type='tiff', crs='EPSG:4326', split_processing=False, extra_metadata={}, beam_options=None, dataset_version='1.0.0', dataset_name='geebeam_dataset', position='center')

Run a Beam pipeline to download image chips from Earth Engine.

Parameters:

image_list (list[ee.Image]) – A list of ee.Image objects to process.
sampling_points (pandas.DataFrame | geopandas.GeoDataFrame | ee.FeatureCollection) – Locations to sample from. The position of each point relative to the patch is controlled by the position argument.
output_path (str) – The path where output will be saved.
output_type (str) – ‘tiff’ (tiffs with parquet for metadata), ‘webdataset’ (tiffs with jsons, in sharded tars), ‘tfrecord’ (raw tfrecords), or ‘tfds’ (tensorflow-dataset).
project (str) – The Google Cloud project ID.
patch_size (int) – The size of the patches to be processed.
scale (float) – The scale factor for image processing.
crs (str) – The coordinate reference system. Defaults to ‘EPSG:4326’.
split_processing (bool) – Flag to indicate if processing should be split. Defaults to False.
extra_metadata (dict) – Additional metadata to include. Defaults to an empty dictionary.
beam_options_dict – Options for the Beam pipeline. Defaults to an empty dictionary.
position (str) – Where the sampling point falls within the patch. One of ‘center’ (default), ‘top-left’, ‘top-right’, ‘bottom-left’, ‘bottom-right’.
dataset_name (str) – For output_type=’tfds’, name for final tfds output. Used for loading the dataset into training pipelines. Default is ‘geebeam_dataset’.
dataset_version (str) – For output_type=’tfds’, semantic version number for final tfds output. Used for loading the dataset into training pipelines. Default is ‘1.0.0’.
beam_options (dict[str] | list[str] | None)

Return type:

None

geebeam.pipeline.sample_and_run_pipeline(image_list, sampling_region, n_sample, output_path, project, patch_size, scale, crs='EPSG:4326', validation_ratio=0, random_seed=0, **kwargs)

Sample random points and then run a Beam pipeline to download image chips from Earth Engine.

Parameters:

image_list (list[ee.Image]) – A list of ee.Image objects to process.
sampling_region (str | geopandas.GeoDataFrame | ee.Geometry) – Region to sample from, polygon or group of polygons.
n_sample (int) – Number of points to sample.
output_path (str) – The path where output will be saved.
project (str) – The Google Cloud project ID.
patch_size (int) – The size of the patches to be processed.
scale (float) – The scale factor for image processing.
validation_ratio (float) – Fraction of points to mark as validation.
random_seed (int) – Seed for random sampling
split_processing – Flag to indicate if processing should be split. Defaults to False.
crs (str) – The coordinate reference system for sampling. Defaults to ‘EPSG:4326’.
**kwargs – Additional keyword arguments are documented in pipeline.run_pipeline().

Return type:

None

geebeam.pipeline.grid_and_run_pipeline(image_list, sampling_region, output_path, project, patch_size, scale, stride, crs='EPSG:4326', buffer_distance=0, validation_ratio=0, random_seed=0, **kwargs)

Sample points from regular grid and then run a Beam pipeline to download image chips from Earth Engine.

Parameters:

image_list (list[ee.Image]) – A list of ee.Image objects to process.
sampling_region (str | geopandas.GeoDataFrame | ee.Geometry) – Region to sample from, polygon or group of polygons.
output_path (str) – The path where output will be saved.
project (str) – The Google Cloud project ID.
patch_size (int) – The size of the patches to be processed.
scale (float) – The scale factor for image processing.
stride (int) – Number of pixels between consecutive samples. If want full coverage without overlaps, stride should be equal to patch_size. If less than patch_size, will generate overlaps. If greater, will be gaps between sampled patches.
crs (str) – The coordinate reference system for sampling. Defaults to ‘EPSG:4326’.
buffer_distance (float) – Distance (in meters) to buffer sampling_region by before gridding. Can be used to ensure complete coverage at edges of sampling_region.
validation_ratio (float) – Fraction of points to mark as validation (can be 0.0).
random_seed (int) – Seed for random sampling
**kwargs – Additional keyword arguments are documented in pipeline.run_pipeline().

Return type:

None