xr_fresh package

Subpackages

Submodules

xr_fresh.backends module

class xr_fresh.backends.Cluster(**kwargs)[source]

Bases: object

Wrapper for Dask clients providing cluster management functionality.

Methods

close()

Close the Dask client and cluster resources.

restart()

Restart the Dask client.

start()

Start a Dask cluster for general computation.

start_large_IO_object()

Start a Dask cluster optimized for large I/O-bound computations.

start_large_object()

Start a Dask cluster optimized for large object computations.

start_small_object()

Start a Dask cluster optimized for small object computations.

close()[source]

Close the Dask client and cluster resources.

restart()[source]

Restart the Dask client.

start()[source]

Start a Dask cluster for general computation.

start_large_IO_object()[source]

Start a Dask cluster optimized for large I/O-bound computations.

start_large_object()[source]

Start a Dask cluster optimized for large object computations.

start_small_object()[source]

Start a Dask cluster optimized for small object computations.

xr_fresh.dimension_reduction module

class xr_fresh.dimension_reduction.ExtendedGeoWombatAccessor(xarray_obj)[source]

Bases: GeoWombatAccessor

Attributes:
affine

Get the affine transform object.

altitude

Get satellite altitudes (in km)

array_is_dask

Get whether the array is a Dask array.

avail_sensors

Get supported sensors.

band_chunks

Get the band chunk size.

bottom

Get the array bounding box bottom coordinate.

bounds

Get the array bounding box (left, bottom, right, top)

bounds_as_namedtuple

Get the array bounding box as a rasterio.coords.BoundingBox

cellx

Get the cell size in the x direction.

cellxh

Get the half width of the cell size in the x direction.

celly

Get the cell size in the y direction.

cellyh

Get the half width of the cell size in the y direction.

central_um

Get a dictionary of central wavelengths (in micrometers)

chunk_grid

Get the image chunk grid.

col_chunks

Get the column chunk size.

crs_to_pyproj

Get the CRS as a pyproj.CRS object.

data_are_separate

Checks whether the data are loaded separately.

data_are_stacked

Checks whether the data are stacked.

dtype

Get the data type of the DataArray.

filenames

Gets the data filenames.

footprint_grid

Get the image footprint grid.

geodataframe

Get a geopandas.GeoDataFrame of the array bounds.

geometry

Get the polygon geometry of the array bounding box.

has_band

Check whether the DataArray has a band attribute.

has_band_coord

Check whether the DataArray has a band coordinate.

has_band_dim

Check whether the DataArray has a band dimension.

has_time

Check whether the DataArray has a time attribute.

has_time_coord

Check whether the DataArray has a time coordinate.

has_time_dim

Check whether the DataArray has a time dimension.

left

Get the array bounding box left coordinate.

meta

Get the array metadata.

nbands

Get the number of array bands.

ncols

Get the number of array columns.

ndims

Get the number of array dimensions.

nodataval

Get the ‘no data’ value from the attributes.

nrows

Get the number of array rows.

ntime

Get the number of time dimensions.

offsetval

Get the offset value.

pydatetime

Get Python datetime objects from the time dimension.

right

Get the array bounding box right coordinate.

row_chunks

Get the row chunk size.

scaleval

Get the scale factor value.

sensor_names

Get sensor full names.

time_chunks

Get the time chunk size.

top

Get the array bounding box top coordinate.

transform

Get the data transform (cell x, 0, left, 0, cell y, top)

unary_union

Get a representation of the union of the image bounds.

wavelengths

Get a dictionary of sensor wavelengths.

Methods

apply(filename, user_func[, n_jobs])

Applies a user function to an Xarray Dataset or DataArray and writes to file.

assign_nodata_attrs(nodata)

Assigns 'no data' attributes.

avi([nodata, mask, sensor, scale_factor])

Calculates the advanced vegetation index

band_mask(valid_bands[, src_nodata, ...])

Creates a mask from band nonzeros.

bounds_overlay(bounds[, how])

Checks whether the bounds overlay the image bounds.

calc_area(values[, op, units, row_chunks, ...])

Calculates the area of data values.

check_chunksize(chunksize, array_size)

Asserts that the chunk size fits within intervals of 16 and is smaller than the array.

clip(df[, query, mask_data, expand_by])

Clips a DataArray by vector polygon geometry.

clip_by_polygon(df[, query, mask_data, ...])

Clips a DataArray by vector polygon geometry.

compare(op, b[, return_binary])

Comparison operation.

compute(**kwargs)

Computes data.

evi([nodata, mask, sensor, scale_factor])

Calculates the enhanced vegetation index

evi2([nodata, mask, sensor, scale_factor])

Calculates the two-band modified enhanced vegetation index

extract(aoi[, bands, time_names, ...])

Extracts data within an area or points of interest.

gcvi([nodata, mask, sensor, scale_factor])

Calculates the green chlorophyll vegetation index

imshow([mask, nodata, flip, text_color, rot])

Shows an image on a plot.

k_pca(gamma, n_components, n_workers, chunk_size)

Applies Kernel PCA to the dataset and returns a DataArray with the components as bands.

kndvi([nodata, mask, sensor, scale_factor])

Calculates the kernel normalized difference vegetation index

mask(df[, query, keep])

Masks a DataArray.

mask_nodata()

Masks 'no data' values with nans.

match_data(data, band_names)

Coerces the xarray.DataArray to match another xarray.DataArray.

moving([stat, perc, w, nodata, weights])

Applies a moving window function to the DataArray.

n_windows([row_chunks, col_chunks])

Calculates the number of windows in a row/column iteration.

nbr([nodata, mask, sensor, scale_factor])

Calculates the normalized burn ratio

ndvi([nodata, mask, sensor, scale_factor])

Calculates the normalized difference vegetation index

norm_brdf(solar_za, solar_az, sensor_za, ...)

Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.

norm_diff(b1, b2[, nodata, mask, sensor, ...])

Calculates the normalized difference band ratio.

read(band, **kwargs)

Reads data for a band or bands.

recode(polygon, to_replace[, num_workers])

Recodes a DataArray with polygon mappings.

replace(to_replace)

Replace values given in to_replace with value.

sample([method, band, n, strata, spacing, ...])

Generates samples from a raster.

save(filename[, mode, nodata, overwrite, ...])

Saves a DataArray to raster using rasterio/dask.

set_nodata([src_nodata, dst_nodata, ...])

Sets 'no data' values and applies scaling to an xarray.DataArray.

subset([left, top, right, bottom, rows, ...])

Subsets a DataArray.

tasseled_cap([nodata, sensor, scale_factor])

Applies a tasseled cap transformation

to_netcdf(filename, *args, **kwargs)

Writes an Xarray DataArray to a NetCDF file.

to_polygon([mask, connectivity])

Converts a dask array to a GeoDataFrame

to_raster(filename[, readxsize, readysize, ...])

Writes an Xarray DataArray to a raster file.

to_vector(filename[, mask, connectivity])

Writes an Xarray DataArray to a vector file.

to_vrt(filename[, overwrite, resampling, ...])

Writes a file to a VRT file.

transform_crs([dst_crs, dst_res, dst_width, ...])

Transforms an xarray.DataArray to a new coordinate reference system.

wi([nodata, mask, sensor, scale_factor])

Calculates the woody vegetation index

windows([row_chunks, col_chunks, ...])

Generates windows for a row/column iteration.

k_pca(gamma: float, n_components: int, n_workers: int, chunk_size: int) DataArray[source]

Applies Kernel PCA to the dataset and returns a DataArray with the components as bands.

Parameters:
  • gamma (float) – The gamma parameter for the RBF kernel.

  • n_components (int) – The number of components to keep.

  • n_workers (int) – The number of parallel jobs for KernelPCA and ParallelTask.

  • chunk_size (int) – The size of the chunks for processing.

Returns:

A DataArray with the Kernel PCA components as bands.

Return type:

xr.DataArray

Examples: # Initialize Ray with ray.init(num_cpus=8) as rays:

# Example usage with gw.open(

sorted(
[

“./tests/data/RadT_tavg_202301.tif”, “./tests/data/RadT_tavg_202302.tif”, “./tests/data/RadT_tavg_202304.tif”, “./tests/data/RadT_tavg_202305.tif”,

]

), stack_dim=”band”, band_names=[0, 1, 2, 3],

) as src:

# get third k principal components - base zero counting transformed_dataarray = src.gw_ext.k_pca(

gamma=15, n_components=3, n_workers=8, chunk_size=256

) transformed_dataarray.plot.imshow(col=’component’, col_wrap=1, figsize=(8, 12)) plt.show()

xr_fresh.extractors_series module

xr_fresh.extractors_series.extract_features_series(gw_series, feature_dict, band_name, output_dir)[source]

Extracts features from a geospatial time series and saves them as TIFF files.

Parameters:
  • gw_series (geowombat.Dataset) – Geospatial time series dataset.

  • feature_dict (dict) – Dictionary containing feature names and parameters.

  • band_name (str) – Name of the band.

  • output_dir (str) – Directory to save the output TIFF files.

Returns:

None

xr_fresh.extractors_series.extract_grid(band_name)[source]

Extracts grid value from the band_name using regular expressions.

Parameters:

band_name (str) – Name of the band.

Returns:

Extracted grid value.

Return type:

grid (str)

xr_fresh.extractors_series.extract_key_value_names(band_name)[source]

Extracts key_names and value_names from the band_name using regular expressions.

Parameters:

band_name (str) – Name of the band.

Returns:

Extracted key names. value_names (str): Extracted value names.

Return type:

key_names (str)

xr_fresh.feature_calculator_series module

class xr_fresh.feature_calculator_series.abs_energy[source]

Bases: TimeModule

Returns the absolute energy of the time series, which is the sum of the squared values.

\[E = \sum_{i=1}^{n} x_i^2\]
Parameters:

x (numpy.ndarray) – Geowombat series object containing a time series of images.

Returns:

The absolute energy of the time series.

Return type:

E (numpy.ndarray)

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.absolute_sum_of_changes[source]

Bases: TimeModule

Returns the sum over the absolute value of consecutive changes in the series x.

\[\sum_{i=1}^{n-1} \mid x_{i+1} - x_i \mid\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.autocorrelation(lag=1)[source]

Bases: TimeModule

Calculates the autocorrelation of the specified lag, according to the formula [1].

\[\frac{1}{(n-l)\sigma^{2}} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu)\]

where \(n\) is the length of the time series \(X_i\), \(\sigma^2\) its variance and \(\mu\) its mean. l denotes the lag.

References

[1] https://en.wikipedia.org/wiki/Autocorrelation#Estimation

Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • lag (int) – lag at which to calculate the autocorrelation (default: {1}).

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.count_above_mean(mean=None)[source]

Bases: TimeModule

Returns the number of values in x that are higher than the mean of x.

\[N_{\text{above}} = \sum_{i=1}^n (x_i > \bar{x})\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • mean (int) – An integer to use as the “mean” value of the raster

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.count_below_mean(mean=None)[source]

Bases: TimeModule

Returns the number of values in x that are lower than the mean of x.

\[N_{\text{below}} = \sum_{i=1}^n (x_i < \bar{x})\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • mean (int) – An integer to use as the “mean” value of the raster

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.doy_of_maximum(dates=None)[source]

Bases: TimeModule

Returns the day of the year (doy) location of the maximum value of the series - treats all years as the same.

Parameters:
  • dates (numpy.ndarray) – An array holding the dates of the time series as integers or as datetime objects.

  • x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The day of the year of the maximum value.

Return type:

int

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.doy_of_minimum(dates=None)[source]

Bases: TimeModule

Returns the day of the year (doy) location of the minimum value of the series - treats all years as the same.

Parameters:
  • dates (numpy.ndarray) – An array holding the dates of the time series as integers or as datetime objects.

  • x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The day of the year of the minimum value.

Return type:

int

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.kurtosis(fisher=True)[source]

Bases: TimeModule

Compute the sample kurtosis of a given array along the time axis.

\[G_2 = \frac{\mu_4}{\sigma^4} - 3\]

where \(\mu_4\) is the fourth central moment and \(\sigma\) is the standard deviation.

Parameters:
  • array (GeoWombat series object) – An object that contains geospatial and temporal metadata.

  • fisher (bool, optional) – If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).

Returns:

Returns the kurtosis of x (calculated with the adjusted Fisher-Pearson standardized moment coefficient G2).

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.kurtosis_excess(Fisher=True)[source]

Bases: TimeModule

Compute the excess kurtosis of a given array along the time axis.

\[G_2 = \frac{\mu_4}{\sigma^4} - 3\]

where \(\mu_4\) is the fourth central moment and \(\sigma\) is the standard deviation.

Parameters:
  • array (GeoWombat series object) – An object that contains geospatial and temporal metadata.

  • fisher (bool, optional) – If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).

Returns:

Returns the excess kurtosis of X (calculated with the adjusted Fisher-Pearson standardized moment coefficient G2).

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.large_standard_deviation(r=2)[source]

Bases: TimeModule

Boolean variable denoting if the standard dev of x is higher than ‘r’ times the range.

Parameters:

r (float, optional) – The percentage of the range to compare with. Default is 2.0.

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.longest_strike_above_mean(mean=None)[source]

Bases: TimeModule

Returns the length of the longest consecutive subsequence in x that is bigger than the mean of x.

Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.longest_strike_below_mean(mean=None)[source]

Bases: TimeModule

Returns the length of the longest consecutive subsequence in x that is smaller than the mean of x.

Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.maximum[source]

Bases: TimeModule

Returns the maximum value of the time series x.

\[x_{\text{max}}\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The maximum value.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.mean[source]

Bases: TimeModule

Returns the mean value of the time series x.

\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The mean value.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.mean_abs_change[source]

Bases: TimeModule

Returns the mean over the absolute differences between subsequent time series values which is

\[\frac{1}{n-1} \sum_{i=1}^{n-1} | x_{i+1} - x_{i} |\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The mean absolute change.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.mean_change[source]

Bases: TimeModule

Returns the mean over the differences between subsequent time series values which is

\[\frac{1}{n-1} \sum_{i=1}^{n-1} ( x_{i+1} - x_{i} )\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The mean change.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.mean_second_derivative_central[source]

Bases: TimeModule

Returns the mean value of a central approximation of the second derivative of the time series.

\[\frac{1}{2(n-2)} \sum_{i=1}^{n-2} \frac{1}{2} (x_{i+2} - 2 \cdot x_{i+1} + x_{i})\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The mean second derivative.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.median[source]

Bases: TimeModule

Returns the median of the time series x.

\[\tilde{x}\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The median value.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.minimum[source]

Bases: TimeModule

Returns the minimum value of the time series x.

\[x_{\text{min}}\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The minimum value.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.ols_slope_intercept(returns='slope')[source]

Bases: TimeModule

Calculate the slope, intercept, and R2 of the time series using ordinary least squares.

Parameters:
  • gw (array) – the time series data

  • returns (str, optional) – What to return, “slope”, “intercept” or “rsquared”. Defaults to “slope”.

Returns:

Return desired time series property array.

Return type:

array

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.quantile(q=None, method='linear')[source]

Bases: TimeModule

Calculates the q-th quantile of x. This is the value of x greater than q% of the ordered values from x.

Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • q (float) – Probability or sequence of probabilities for the quantiles to compute. Values must be between 0 and 1 inclusive.

Returns:

The q-th quantile of x.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.ratio_beyond_r_sigma(r=2)[source]

Bases: TimeModule

Returns the ratio of values that are more than r times the standard deviation away from the mean of the time series.

\[P_{r} = \frac{1}{n} \sum_{i=1}^{n} (| x_i - \bar{x} | > r \cdot \sigma)\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • r (float) – The number of standard deviations. Defaults to 2.

Returns:

The ratio of values beyond r sigma.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.skewness[source]

Bases: TimeModule

Returns the skewness of x.

\[\frac{n}{(n-1)(n-2)} \sum \left( \frac{X_i - \overline{X}}{s} \right)^3\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • axis (int, optional) – Axis along which to compute the kurtosis. Default is 0.

  • fisher (bool, optional) – If True, Fisher’s definition is used (normal=0). If False, Pearson’s definition is used (normal=3). Default is False.

Returns:

The skewness.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.standard_deviation[source]

Bases: TimeModule

Returns the standard deviation of x.

\[\sqrt{ \frac{1}{N} \sum_{i=1}^{n} (x_i - \bar{x})^2 }\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The standard deviation.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.sum[source]

Bases: TimeModule

Returns the sum of all values in x.

\[S = \sum_{i=1}^{n} x_i\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The sum of values.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.symmetry_looking(r=0.1)[source]

Bases: TimeModule

Measures the similarity of the time series when flipped horizontally. Boolean variable denoting if the distribution of x looks symmetric.

\[| x_{\text{mean}} - x_{\text{median}} | < r \cdot (x_{\text{max}} - x_{\text{min}} )\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • r (float) – A threshold value, the percentage of the range to compare with (default: 0.1)

Returns:

The symmetry measure.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.ts_complexity_cid_ce(normalize=True)[source]

Bases: TimeModule

Returns the time series complexity measure CID CE.

\[\sqrt{ \sum_{i=1}^{n-1} ( x_{i} - x_{i-1})^2 }\]
Parameters:
  • x (numpy.ndarray) – Geowombat series object contain time series of images.

  • normalize – should the time series be z-transformed? (default: True)

Returns:

The complexity measure.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.unique_value_number_to_time_series_length[source]

Bases: TimeModule

Returns a factor which is 1 if all values in the time series occur only once, and below one if this is not the case. In principle, it just returns

# of unique values / # of values

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.variance[source]

Bases: TimeModule

Returns the variance of x.

\[\sigma^2 = \frac{1}{N} \sum_{i=1}^{n} (x_i - \bar{x})^2\]
Parameters:

x (numpy.ndarray) – Geowombat series object contain time series of images.

Returns:

The variance.

Return type:

float

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

class xr_fresh.feature_calculator_series.variance_larger_than_standard_deviation[source]

Bases: TimeModule

Returns 1 if the variance of x is larger than its standard deviation and 0 otherwise.

\[\sigma^2 > \sigma\]
Parameters:

x (numpy.ndarray) – Geowombat series object containing a time series of images.

Returns:

1 if variance is larger than standard deviation, 0 otherwise.

Return type:

int

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(x)

Calculates the user function.

calculate(x)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

xr_fresh.interpolate_series module

class xr_fresh.interpolate_series.interpolate_nan(missing_value=None, interp_type='linear', count=1, dates=None)[source]

Bases: TimeModule

Interpolate missing values in a geospatial time series. Without dates set this class assumes a regular time interval between observations. With dates set this class can handle irregular time, based on the DOY as an index.

Parameters:
  • missing_value (int or float, optional) – The value to be replaced by NaNs. Default is None.

  • interp_type (str, optional) – The type of interpolation algorithm to use. Options include “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “previous”, “next”, “cubicspline”, “spline”, and “UnivariateSpline”. Default is “linear”.

  • dates (list[datetime]) – List of datetime objects corresponding to each time slice.

  • count (int, optional) – Overrides the default output band count. Default is 1.

calculate(array)[source]

Applies the interpolation on the input array.

Example Usage:

pth = "/home/mmann1123/Dropbox/Africa_data/Temperature/"
files = sorted(glob(f"{pth}*.tif"))[0:10]
strp_glob = f"{pth}RadT_tavg_%Y%m.tif"
dates = sorted(datetime.strptime(string, strp_glob) for string in files)
date_strings = [date.strftime("%Y-%m-%d") for date in dates]

# window size controls RAM usage, transfer lab can be jax if using GPU
with gw.series(files, window_size=[640, 640], transfer_lib="numpy") as src:
    src.apply(
        func=interpolate_nan(
            missing_value=0,
            count=len(src.filenames),
            dates=dates,
        ),
        outfile="/home/mmann1123/Downloads/test.tif",
        num_workers=min(12, src.nchunks),
        bands=1,
    )

Methods

__call__(w, array, band_dict)

Call self as a function.

calculate(array)

Calculates the user function.

calculate(array)[source]

Calculates the user function.

Parameters:

| (data (numpy.ndarray) – jax.Array | torch.Tensor | tensorflow.Tensor): The input array, shaped [time x bands x rows x columns].

Returns:

numpy.ndarray | jax.Array | torch.Tensor | tensorflow.Tensor:

Shaped (time|bands x rows x columns)

xr_fresh.io module

class xr_fresh.io.WriteDaskArray(filename, src, overwrite=True)[source]

Bases: object

xr_fresh.io.WriteStackedArray(src: DataArray, file_path='/tmp/test.parquet')[source]

Writes stacked ie. flattened by (y,x,time) to parquet in chunks.

Parameters:
  • src (xr.DataArray) – [description]

  • file_path ([type], optional) – [description], defaults to “/tmp/test.parquet”:path

xr_fresh.io.parquet_append(file_list: list, out_path: str, filters: list)[source]

Read, filter and append large set of parquet files to a single file. Note: resulting file must be read with pd.read_parquet(engine=’pyarrow’)

See read_table docs

Parameters:
  • file_list (list) – list of file paths to .parquet files

  • out_path (str) – path and name of output parquet file

  • filters (list) – list of

xr_fresh.io.stack_to_pandas(data, src, t, b, y, x)[source]

xr_fresh.transformers module

Created on Mon Aug 10 13:41:40 2020 adapted from sklearn-xarray/preprocessing @author: mmann1123

class xr_fresh.transformers.BaseTransformer[source]

Bases: BaseEstimator, TransformerMixin

Base class for transformers.

Methods

fit(X[, y])

Fit estimator to data.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data.

fit(X, y=None, **fit_params)[source]

Fit estimator to data. :param X: Training set. :type X: xarray DataArray or Dataset :param y: Target values. :type y: xarray DataArray or Dataset

Returns:

The estimator itself.

Return type:

self

transform(X)[source]

Transform input data. :param X: The input data. :type X: xarray DataArray or Dataset

Returns:

Xt – The transformed data.

Return type:

xarray DataArray or Dataset

class xr_fresh.transformers.Stackerizer(stack_dims=None, direction='stack', sample_dim='sample', transposed=True, groupby=None, compute=True)[source]

Bases: BaseTransformer

Transformer to handle higher dimensional data, for instance data

sampled in time and location (‘x’,’y’,’time’), that must be stacked before running Featurizer, and unstacked after prediction.

Parameters:
  • sample_dim (str) –

    List (tuple) of the dimensions used to define how the data is sampled.

    If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex ‘sample’ will be created for these dimensions.

  • direction (str, optional) – “stack” or “unstack” defines the direction of transformation. Default is “stack”

  • sample_dim – Name of multiindex used to stack sample dims. Defaults to “sample”

  • transposed (bool) – Should the output be transposed after stacking. Default is True.

Returns:

Xt – The transformed data.

Return type:

xarray DataArray or Dataset

Methods

fit(X[, y])

Fit estimator to data.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform input data.

xr_fresh.transformers.is_dataarray(X, require_attrs=None)[source]

Check whether an object is a DataArray.

Parameters:
  • X (anything) – The object to be checked.

  • require_attrs (list of str, optional) – The attributes the object has to have in order to pass as a DataArray.

Returns:

Whether the object is a DataArray or not.

Return type:

bool

xr_fresh.transformers.is_dataset(X, require_attrs=None)[source]

Check whether an object is a Dataset. :param X: The object to be checked. :type X: anything :param require_attrs: The attributes the object has to have in order to pass as a Dataset. :type require_attrs: list of str, optional

Returns:

Whether the object is a Dataset or not.

Return type:

bool

xr_fresh.transformers.stackerizer(X, return_estimator=False, **fit_params)[source]

Stacks all dimensions and variables except for sample dimension.

Parameters:
  • X (xarray DataArray or Dataset"") – The input data.

  • return_estimator (bool) – Whether to return the fitted estimator along with the transformed data.

Returns:

Xt – The transformed data.

Return type:

xarray DataArray or Dataset

xr_fresh.utils module

Created on Tue Jun 30 15:34:47 2020

@author: https://github.com/robintw/XArrayAndRasterio/blob/master/rasterio_to_xarray.py

xr_fresh.utils.add_categorical(data, labels=None, col=None, variable_name=None, missing_value=-9999)[source]

Adds categorical data to xarray by column name.

Examples

climatecluster = ‘ ./ClusterEco15_Y5.shp’

with gw.open(vrts,

time_names = [str(x) for x in range(len(vrts))], ) as ds:

ds.attrs[‘filename’] = vrts ds = add_categorical(ds, climatecluster,col=’ClusterN_2’,variable_name=’clim_clust’) print(ds)

Parameters:
  • data (xarray.DataArray) – xarray to add categorical data to

  • labels (path or gpd.geodataframe or path to tif) – path or df to shapefile or raster with categorical data

  • col (str) – Column to create get values from

  • variable_name (str) – name assigned to categorical data

  • missing_value (int) – missing value for pixels not overlapping polygon or points

xr_fresh.utils.add_time_targets(data, target, target_col_list=None, target_name='target', missing_value=-9999, append_to_X=False)[source]

Adds multiple time periods of target data to existing xarray obj.

Examples

with gw.open(vrts, time_names = time_names, chunks=400) as ds:
ds = add_time_targets( data = ds,

target= loss_poly, target_col_list = [‘w_dam_2010’,’w_dam_2011’,’w_dam_2012’,

‘w_dam_2013’,’w_dam_2014’,’w_dam_2015’,’w_dam_2016’],

target_name=’weather_damage’, missing_value=np.nan, append_to_X=True )

Parameters:
  • data (xarray.DataArray) – xarray to add target data to

  • target (path or gpd.geodataframe) – path or df to shapefile with target data data

  • target_col_list (list) – list of columns holding target data All column names must be in acceding order e.g. [‘t_2010’,’t_2011’]

  • target_name (str) – single name assigned to target data dimension. Default is ‘target’

  • missing_value (int) – missing value for pixels not overlapping polygon or points

  • append_to_X (bool) – should the target data be appended to the far right of other X variables. Default is False.

xr_fresh.utils.bound(x, min=0, max=100)[source]
xr_fresh.utils.check_variable_lengths(variable_list)[source]

Check if a list of variable files are of equal length

Parameters:

variable_list (list)

Returns:

DESCRIPTION.

Return type:

TYPE bool

xr_fresh.utils.compressed_pickle(data, filename, compress='gz')[source]
xr_fresh.utils.convert_to_min_dtype(arr)[source]

Convert a numpy array to the smallest data type possible :param arr: numpy array :type arr: np.array :return: numpy array with smallest data type :rtype: np.array

Examples

>>> arr = np.array([1, 2, 3, 4, 5])
>>> convert_to_min_dtype(arr)
array([1, 2, 3, 4, 5], dtype=int8)
xr_fresh.utils.decompress_pickle(file, compress='gz')[source]
xr_fresh.utils.downcast_pandas(data)[source]

Dtype cast to smallest numerical dtype possible for pandas dataframes. Saves considerable space. Objects are cast to categorical, int and float are cast to the smallest dtype

https://pandas.pydata.org/pandas-docs/version/1.0.0/reference/api/pandas.to_numeric.html#pandas.to_numeric

Note: could be problematic with chunks if different dtypes are assigned to same column

Parameters:

data (DataFrame) – input dataframe

Returns:

downcast dataframe

Return type:

DataFrame

xr_fresh.utils.find_variable_names(path_glob)[source]

Return all unique variables names from path glob, removing trailing date and __

Example: path_glob = f”{file_path}NDVI_MODIS/Meher_features/ndvi*.tif” find_variable_names(path_glob)

Parameters:

path_glob (path) – path with * for file glob

xr_fresh.utils.find_variable_year(path_glob, digits=4, strp_glob='%Y.tif')[source]

Return all unique variables 4 digit years years from path glob

Example: path_glob = f”{file_path}NDVI_MODIS/Meher_features/ndvi*.tif” find_variable_names(path_glob)

Parameters:
  • path_glob (path) – path with * for file glob

  • digits (int) – number of digits used to store year

  • strp_glob (string) – strptime pattern with year format and file type

xr_fresh.utils.open_pickle(path)[source]
xr_fresh.utils.save_pickle(obj, filename)[source]
xr_fresh.utils.to_vrt(data, filename, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]

Writes a file to a VRT file :param data: The xarray.DataArray to write. :type data: DataArray :param filename: The output file name to write to. :type filename: str :param resampling: The resampling algorithm for rasterio.vrt.WarpedVRT. Default is ‘nearest’. :type resampling: Optional[object] :param nodata: The ‘no data’ value for rasterio.vrt.WarpedVRT. :type nodata: Optional[float or int] :param init_dest_nodata: Whether or not to initialize output to nodata for rasterio.vrt.WarpedVRT. :type init_dest_nodata: Optional[bool] :param warp_mem_limit: The GDAL memory limit for rasterio.vrt.WarpedVRT. :type warp_mem_limit: Optional[int]

Example

>>> import geowombat as gw
>>> from rasterio.enums import Resampling
>>>
>>> # Transform a CRS and save to VRT
>>> with gw.config.update(ref_crs=102033):
>>>     with gw.open('image.tif') as src:
>>>         gw.to_vrt(src,
>>>                   'output.vrt',
>>>                   resampling=Resampling.cubic,
>>>                   warp_mem_limit=256)
>>>
>>> # Load multiple files set to a common geographic extent
>>> bounds = (left, bottom, right, top)
>>> with gw.config.update(ref_bounds=bounds):
>>>     with gw.open(['image1.tif', 'image2.tif'], mosaic=True) as src:
>>>         gw.to_vrt(src, 'output.vrt')
xr_fresh.utils.unique(ls)[source]
xr_fresh.utils.xarray_to_rasterio(xr_data, path='', postfix='', bands=None)[source]

Writes xarray bands to disk by band

Examples

>>>  f_dict = { 'maximum':[{}] ,
               'quantile': [{'q':"0.5"},{'q':'0.95'}]}
>>>  features = extract_features(xr_data=ds,
>>>                     feature_dict=f_dict,
>>>                     band='aet',
>>>                     na_rm = True)
>>>  xarray_to_rasterio(features,'/home/mmann1123/Desktop/', postfix='test')
Parameters:
  • xr_data (xarray.DataArray) – xarray to write

  • path (str) – file destination path

  • output_postfix (list) – text to append to back of written image

  • output_postfix – list of character strings or locations of band names, if None all bands are written

xr_fresh.visualizer module

xr_fresh.visualizer.plot_interpolated_actual(interpolated_stack: str, original_image_list: list, samples: int = 20)[source]

Plots the interpolated and actual values for a given time series.

Parameters:
  • interpolated_stack (str) – multiband stack of images representing interpolated time series. Defaults to None.

  • original_image_list (list) – list of files used in interpolation. Defaults to None.

  • samples (int, optional) – number of random points to compare time series. Defaults to 20.

Module contents