xr_fresh package
Subpackages
Subpackages:
Submodules
xr_fresh.backends module
- class xr_fresh.backends.Cluster(**kwargs)[source]
Bases:
object
Wrapper for Dask clients providing cluster management functionality.
Methods
close
()Close the Dask client and cluster resources.
restart
()Restart the Dask client.
start
()Start a Dask cluster for general computation.
Start a Dask cluster optimized for large I/O-bound computations.
Start a Dask cluster optimized for large object computations.
Start a Dask cluster optimized for small object computations.
xr_fresh.dimension_reduction module
- class xr_fresh.dimension_reduction.ExtendedGeoWombatAccessor(xarray_obj)[source]
Bases:
GeoWombatAccessor
- Attributes:
affine
Get the affine transform object.
altitude
Get satellite altitudes (in km)
array_is_dask
Get whether the array is a Dask array.
avail_sensors
Get supported sensors.
band_chunks
Get the band chunk size.
bottom
Get the array bounding box bottom coordinate.
bounds
Get the array bounding box (left, bottom, right, top)
bounds_as_namedtuple
Get the array bounding box as a
rasterio.coords.BoundingBox
cellx
Get the cell size in the x direction.
cellxh
Get the half width of the cell size in the x direction.
celly
Get the cell size in the y direction.
cellyh
Get the half width of the cell size in the y direction.
central_um
Get a dictionary of central wavelengths (in micrometers)
chunk_grid
Get the image chunk grid.
col_chunks
Get the column chunk size.
crs_to_pyproj
Get the CRS as a
pyproj.CRS
object.data_are_separate
Checks whether the data are loaded separately.
data_are_stacked
Checks whether the data are stacked.
dtype
Get the data type of the DataArray.
filenames
Gets the data filenames.
footprint_grid
Get the image footprint grid.
geodataframe
Get a
geopandas.GeoDataFrame
of the array bounds.geometry
Get the polygon geometry of the array bounding box.
has_band
Check whether the DataArray has a band attribute.
has_band_coord
Check whether the DataArray has a band coordinate.
has_band_dim
Check whether the DataArray has a band dimension.
has_time
Check whether the DataArray has a time attribute.
has_time_coord
Check whether the DataArray has a time coordinate.
has_time_dim
Check whether the DataArray has a time dimension.
left
Get the array bounding box left coordinate.
meta
Get the array metadata.
nbands
Get the number of array bands.
ncols
Get the number of array columns.
ndims
Get the number of array dimensions.
nodataval
Get the ‘no data’ value from the attributes.
nrows
Get the number of array rows.
ntime
Get the number of time dimensions.
offsetval
Get the offset value.
pydatetime
Get Python datetime objects from the time dimension.
right
Get the array bounding box right coordinate.
row_chunks
Get the row chunk size.
scaleval
Get the scale factor value.
sensor_names
Get sensor full names.
time_chunks
Get the time chunk size.
top
Get the array bounding box top coordinate.
transform
Get the data transform (cell x, 0, left, 0, cell y, top)
unary_union
Get a representation of the union of the image bounds.
wavelengths
Get a dictionary of sensor wavelengths.
Methods
apply
(filename, user_func[, n_jobs])Applies a user function to an Xarray Dataset or DataArray and writes to file.
assign_nodata_attrs
(nodata)Assigns 'no data' attributes.
avi
([nodata, mask, sensor, scale_factor])Calculates the advanced vegetation index
band_mask
(valid_bands[, src_nodata, ...])Creates a mask from band nonzeros.
bounds_overlay
(bounds[, how])Checks whether the bounds overlay the image bounds.
calc_area
(values[, op, units, row_chunks, ...])Calculates the area of data values.
check_chunksize
(chunksize, array_size)Asserts that the chunk size fits within intervals of 16 and is smaller than the array.
clip
(df[, query, mask_data, expand_by])Clips a DataArray by vector polygon geometry.
clip_by_polygon
(df[, query, mask_data, ...])Clips a DataArray by vector polygon geometry.
compare
(op, b[, return_binary])Comparison operation.
compute
(**kwargs)Computes data.
evi
([nodata, mask, sensor, scale_factor])Calculates the enhanced vegetation index
evi2
([nodata, mask, sensor, scale_factor])Calculates the two-band modified enhanced vegetation index
extract
(aoi[, bands, time_names, ...])Extracts data within an area or points of interest.
gcvi
([nodata, mask, sensor, scale_factor])Calculates the green chlorophyll vegetation index
imshow
([mask, nodata, flip, text_color, rot])Shows an image on a plot.
k_pca
(gamma, n_components, n_workers, chunk_size)Applies Kernel PCA to the dataset and returns a DataArray with the components as bands.
kndvi
([nodata, mask, sensor, scale_factor])Calculates the kernel normalized difference vegetation index
mask
(df[, query, keep])Masks a DataArray.
mask_nodata
()Masks 'no data' values with nans.
match_data
(data, band_names)Coerces the
xarray.DataArray
to match anotherxarray.DataArray
.moving
([stat, perc, w, nodata, weights])Applies a moving window function to the DataArray.
n_windows
([row_chunks, col_chunks])Calculates the number of windows in a row/column iteration.
nbr
([nodata, mask, sensor, scale_factor])Calculates the normalized burn ratio
ndvi
([nodata, mask, sensor, scale_factor])Calculates the normalized difference vegetation index
norm_brdf
(solar_za, solar_az, sensor_za, ...)Applies Bidirectional Reflectance Distribution Function (BRDF) normalization.
norm_diff
(b1, b2[, nodata, mask, sensor, ...])Calculates the normalized difference band ratio.
read
(band, **kwargs)Reads data for a band or bands.
recode
(polygon, to_replace[, num_workers])Recodes a DataArray with polygon mappings.
replace
(to_replace)Replace values given in
to_replace
with value.sample
([method, band, n, strata, spacing, ...])Generates samples from a raster.
save
(filename[, mode, nodata, overwrite, ...])Saves a DataArray to raster using rasterio/dask.
set_nodata
([src_nodata, dst_nodata, ...])Sets 'no data' values and applies scaling to an
xarray.DataArray
.subset
([left, top, right, bottom, rows, ...])Subsets a DataArray.
tasseled_cap
([nodata, sensor, scale_factor])Applies a tasseled cap transformation
to_netcdf
(filename, *args, **kwargs)Writes an Xarray DataArray to a NetCDF file.
to_polygon
([mask, connectivity])Converts a
dask
array to aGeoDataFrame
to_raster
(filename[, readxsize, readysize, ...])Writes an Xarray DataArray to a raster file.
to_vector
(filename[, mask, connectivity])Writes an Xarray DataArray to a vector file.
to_vrt
(filename[, overwrite, resampling, ...])Writes a file to a VRT file.
transform_crs
([dst_crs, dst_res, dst_width, ...])Transforms an
xarray.DataArray
to a new coordinate reference system.wi
([nodata, mask, sensor, scale_factor])Calculates the woody vegetation index
windows
([row_chunks, col_chunks, ...])Generates windows for a row/column iteration.
- k_pca(gamma: float, n_components: int, n_workers: int, chunk_size: int) DataArray [source]
Applies Kernel PCA to the dataset and returns a DataArray with the components as bands.
- Parameters:
gamma (float) – The gamma parameter for the RBF kernel.
n_components (int) – The number of components to keep.
n_workers (int) – The number of parallel jobs for KernelPCA and ParallelTask.
chunk_size (int) – The size of the chunks for processing.
- Returns:
A DataArray with the Kernel PCA components as bands.
- Return type:
xr.DataArray
Examples: # Initialize Ray with ray.init(num_cpus=8) as rays:
# Example usage with gw.open(
- sorted(
- [
“./tests/data/RadT_tavg_202301.tif”, “./tests/data/RadT_tavg_202302.tif”, “./tests/data/RadT_tavg_202304.tif”, “./tests/data/RadT_tavg_202305.tif”,
]
), stack_dim=”band”, band_names=[0, 1, 2, 3],
- ) as src:
# get third k principal components - base zero counting transformed_dataarray = src.gw_ext.k_pca(
gamma=15, n_components=3, n_workers=8, chunk_size=256
) transformed_dataarray.plot.imshow(col=’component’, col_wrap=1, figsize=(8, 12)) plt.show()
xr_fresh.extractors_series module
- xr_fresh.extractors_series.extract_features_series(gw_series, feature_dict, band_name, output_dir)[source]
Extracts features from a geospatial time series and saves them as TIFF files.
- Parameters:
gw_series (geowombat.Dataset) – Geospatial time series dataset.
feature_dict (dict) – Dictionary containing feature names and parameters.
band_name (str) – Name of the band.
output_dir (str) – Directory to save the output TIFF files.
- Returns:
None
- xr_fresh.extractors_series.extract_grid(band_name)[source]
Extracts grid value from the band_name using regular expressions.
- Parameters:
band_name (str) – Name of the band.
- Returns:
Extracted grid value.
- Return type:
grid (str)
- xr_fresh.extractors_series.extract_key_value_names(band_name)[source]
Extracts key_names and value_names from the band_name using regular expressions.
- Parameters:
band_name (str) – Name of the band.
- Returns:
Extracted key names. value_names (str): Extracted value names.
- Return type:
key_names (str)
xr_fresh.feature_calculator_series module
- class xr_fresh.feature_calculator_series.abs_energy[source]
Bases:
TimeModule
Returns the absolute energy of the time series, which is the sum of the squared values.
\[E = \sum_{i=1}^{n} x_i^2\]- Parameters:
x (numpy.ndarray) – Geowombat series object containing a time series of images.
- Returns:
The absolute energy of the time series.
- Return type:
E (numpy.ndarray)
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.absolute_sum_of_changes[source]
Bases:
TimeModule
Returns the sum over the absolute value of consecutive changes in the series x.
\[\sum_{i=1}^{n-1} \mid x_{i+1} - x_i \mid\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.autocorrelation(lag=1)[source]
Bases:
TimeModule
Calculates the autocorrelation of the specified lag, according to the formula [1].
\[\frac{1}{(n-l)\sigma^{2}} \sum_{t=1}^{n-l}(X_{t}-\mu )(X_{t+l}-\mu)\]where \(n\) is the length of the time series \(X_i\), \(\sigma^2\) its variance and \(\mu\) its mean. l denotes the lag.
References
[1] https://en.wikipedia.org/wiki/Autocorrelation#Estimation
- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
lag (int) – lag at which to calculate the autocorrelation (default: {1}).
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.count_above_mean(mean=None)[source]
Bases:
TimeModule
Returns the number of values in x that are higher than the mean of x.
\[N_{\text{above}} = \sum_{i=1}^n (x_i > \bar{x})\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
mean (int) – An integer to use as the “mean” value of the raster
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.count_below_mean(mean=None)[source]
Bases:
TimeModule
Returns the number of values in x that are lower than the mean of x.
\[N_{\text{below}} = \sum_{i=1}^n (x_i < \bar{x})\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
mean (int) – An integer to use as the “mean” value of the raster
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.doy_of_maximum(dates=None)[source]
Bases:
TimeModule
Returns the day of the year (doy) location of the maximum value of the series - treats all years as the same.
- Parameters:
dates (numpy.ndarray) – An array holding the dates of the time series as integers or as datetime objects.
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The day of the year of the maximum value.
- Return type:
int
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.doy_of_minimum(dates=None)[source]
Bases:
TimeModule
Returns the day of the year (doy) location of the minimum value of the series - treats all years as the same.
- Parameters:
dates (numpy.ndarray) – An array holding the dates of the time series as integers or as datetime objects.
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The day of the year of the minimum value.
- Return type:
int
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.kurtosis(fisher=True)[source]
Bases:
TimeModule
Compute the sample kurtosis of a given array along the time axis.
\[G_2 = \frac{\mu_4}{\sigma^4} - 3\]where \(\mu_4\) is the fourth central moment and \(\sigma\) is the standard deviation.
- Parameters:
array (GeoWombat series object) – An object that contains geospatial and temporal metadata.
fisher (bool, optional) – If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
- Returns:
Returns the kurtosis of x (calculated with the adjusted Fisher-Pearson standardized moment coefficient G2).
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.kurtosis_excess(Fisher=True)[source]
Bases:
TimeModule
Compute the excess kurtosis of a given array along the time axis.
\[G_2 = \frac{\mu_4}{\sigma^4} - 3\]where \(\mu_4\) is the fourth central moment and \(\sigma\) is the standard deviation.
- Parameters:
array (GeoWombat series object) – An object that contains geospatial and temporal metadata.
fisher (bool, optional) – If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
- Returns:
Returns the excess kurtosis of X (calculated with the adjusted Fisher-Pearson standardized moment coefficient G2).
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.large_standard_deviation(r=2)[source]
Bases:
TimeModule
Boolean variable denoting if the standard dev of x is higher than ‘r’ times the range.
- Parameters:
r (float, optional) – The percentage of the range to compare with. Default is 2.0.
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.longest_strike_above_mean(mean=None)[source]
Bases:
TimeModule
Returns the length of the longest consecutive subsequence in x that is bigger than the mean of x.
- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.longest_strike_below_mean(mean=None)[source]
Bases:
TimeModule
Returns the length of the longest consecutive subsequence in x that is smaller than the mean of x.
- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.maximum[source]
Bases:
TimeModule
Returns the maximum value of the time series x.
\[x_{\text{max}}\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The maximum value.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.mean[source]
Bases:
TimeModule
Returns the mean value of the time series x.
\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The mean value.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.mean_abs_change[source]
Bases:
TimeModule
Returns the mean over the absolute differences between subsequent time series values which is
\[\frac{1}{n-1} \sum_{i=1}^{n-1} | x_{i+1} - x_{i} |\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The mean absolute change.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.mean_change[source]
Bases:
TimeModule
Returns the mean over the differences between subsequent time series values which is
\[\frac{1}{n-1} \sum_{i=1}^{n-1} ( x_{i+1} - x_{i} )\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The mean change.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.mean_second_derivative_central[source]
Bases:
TimeModule
Returns the mean value of a central approximation of the second derivative of the time series.
\[\frac{1}{2(n-2)} \sum_{i=1}^{n-2} \frac{1}{2} (x_{i+2} - 2 \cdot x_{i+1} + x_{i})\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The mean second derivative.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.median[source]
Bases:
TimeModule
Returns the median of the time series x.
\[\tilde{x}\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The median value.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.minimum[source]
Bases:
TimeModule
Returns the minimum value of the time series x.
\[x_{\text{min}}\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The minimum value.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.ols_slope_intercept(returns='slope')[source]
Bases:
TimeModule
Calculate the slope, intercept, and R2 of the time series using ordinary least squares.
- Parameters:
gw (array) – the time series data
returns (str, optional) – What to return, “slope”, “intercept” or “rsquared”. Defaults to “slope”.
- Returns:
Return desired time series property array.
- Return type:
array
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.quantile(q=None, method='linear')[source]
Bases:
TimeModule
Calculates the q-th quantile of x. This is the value of x greater than q% of the ordered values from x.
- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
q (float) – Probability or sequence of probabilities for the quantiles to compute. Values must be between 0 and 1 inclusive.
- Returns:
The q-th quantile of x.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.ratio_beyond_r_sigma(r=2)[source]
Bases:
TimeModule
Returns the ratio of values that are more than r times the standard deviation away from the mean of the time series.
\[P_{r} = \frac{1}{n} \sum_{i=1}^{n} (| x_i - \bar{x} | > r \cdot \sigma)\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
r (float) – The number of standard deviations. Defaults to 2.
- Returns:
The ratio of values beyond r sigma.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.skewness[source]
Bases:
TimeModule
Returns the skewness of x.
\[\frac{n}{(n-1)(n-2)} \sum \left( \frac{X_i - \overline{X}}{s} \right)^3\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
axis (int, optional) – Axis along which to compute the kurtosis. Default is 0.
fisher (bool, optional) – If True, Fisher’s definition is used (normal=0). If False, Pearson’s definition is used (normal=3). Default is False.
- Returns:
The skewness.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.standard_deviation[source]
Bases:
TimeModule
Returns the standard deviation of x.
\[\sqrt{ \frac{1}{N} \sum_{i=1}^{n} (x_i - \bar{x})^2 }\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The standard deviation.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.sum[source]
Bases:
TimeModule
Returns the sum of all values in x.
\[S = \sum_{i=1}^{n} x_i\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The sum of values.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.symmetry_looking(r=0.1)[source]
Bases:
TimeModule
Measures the similarity of the time series when flipped horizontally. Boolean variable denoting if the distribution of x looks symmetric.
\[| x_{\text{mean}} - x_{\text{median}} | < r \cdot (x_{\text{max}} - x_{\text{min}} )\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
r (float) – A threshold value, the percentage of the range to compare with (default: 0.1)
- Returns:
The symmetry measure.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.ts_complexity_cid_ce(normalize=True)[source]
Bases:
TimeModule
Returns the time series complexity measure CID CE.
\[\sqrt{ \sum_{i=1}^{n-1} ( x_{i} - x_{i-1})^2 }\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
normalize – should the time series be z-transformed? (default: True)
- Returns:
The complexity measure.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.unique_value_number_to_time_series_length[source]
Bases:
TimeModule
Returns a factor which is 1 if all values in the time series occur only once, and below one if this is not the case. In principle, it just returns
# of unique values / # of values
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.variance[source]
Bases:
TimeModule
Returns the variance of x.
\[\sigma^2 = \frac{1}{N} \sum_{i=1}^{n} (x_i - \bar{x})^2\]- Parameters:
x (numpy.ndarray) – Geowombat series object contain time series of images.
- Returns:
The variance.
- Return type:
float
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
- class xr_fresh.feature_calculator_series.variance_larger_than_standard_deviation[source]
Bases:
TimeModule
Returns 1 if the variance of x is larger than its standard deviation and 0 otherwise.
\[\sigma^2 > \sigma\]- Parameters:
x (numpy.ndarray) – Geowombat series object containing a time series of images.
- Returns:
1 if variance is larger than standard deviation, 0 otherwise.
- Return type:
int
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(x)Calculates the user function.
- calculate(x)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
xr_fresh.interpolate_series module
- class xr_fresh.interpolate_series.interpolate_nan(missing_value=None, interp_type='linear', count=1, dates=None)[source]
Bases:
TimeModule
Interpolate missing values in a geospatial time series. Without dates set this class assumes a regular time interval between observations. With dates set this class can handle irregular time, based on the DOY as an index.
- Parameters:
missing_value (int or float, optional) – The value to be replaced by NaNs. Default is None.
interp_type (str, optional) – The type of interpolation algorithm to use. Options include “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “previous”, “next”, “cubicspline”, “spline”, and “UnivariateSpline”. Default is “linear”.
dates (list[datetime]) – List of datetime objects corresponding to each time slice.
count (int, optional) – Overrides the default output band count. Default is 1.
Example Usage:
pth = "/home/mmann1123/Dropbox/Africa_data/Temperature/" files = sorted(glob(f"{pth}*.tif"))[0:10] strp_glob = f"{pth}RadT_tavg_%Y%m.tif" dates = sorted(datetime.strptime(string, strp_glob) for string in files) date_strings = [date.strftime("%Y-%m-%d") for date in dates] # window size controls RAM usage, transfer lab can be jax if using GPU with gw.series(files, window_size=[640, 640], transfer_lib="numpy") as src: src.apply( func=interpolate_nan( missing_value=0, count=len(src.filenames), dates=dates, ), outfile="/home/mmann1123/Downloads/test.tif", num_workers=min(12, src.nchunks), bands=1, )
Methods
__call__
(w, array, band_dict)Call self as a function.
calculate
(array)Calculates the user function.
- calculate(array)[source]
Calculates the user function.
- Parameters:
| (data (numpy.ndarray) –
jax.Array
|torch.Tensor
|tensorflow.Tensor
): The input array, shaped [time x bands x rows x columns].- Returns:
numpy.ndarray
|jax.Array
|torch.Tensor
|tensorflow.Tensor
:Shaped (time|bands x rows x columns)
xr_fresh.io module
- xr_fresh.io.WriteStackedArray(src: DataArray, file_path='/tmp/test.parquet')[source]
Writes stacked ie. flattened by (y,x,time) to parquet in chunks.
- Parameters:
src (xr.DataArray) – [description]
file_path ([type], optional) – [description], defaults to “/tmp/test.parquet”:path
- xr_fresh.io.parquet_append(file_list: list, out_path: str, filters: list)[source]
Read, filter and append large set of parquet files to a single file. Note: resulting file must be read with pd.read_parquet(engine=’pyarrow’)
- Parameters:
file_list (list) – list of file paths to .parquet files
out_path (str) – path and name of output parquet file
filters (list) – list of
xr_fresh.transformers module
Created on Mon Aug 10 13:41:40 2020 adapted from sklearn-xarray/preprocessing @author: mmann1123
- class xr_fresh.transformers.BaseTransformer[source]
Bases:
BaseEstimator
,TransformerMixin
Base class for transformers.
Methods
fit
(X[, y])Fit estimator to data.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform input data.
- class xr_fresh.transformers.Stackerizer(stack_dims=None, direction='stack', sample_dim='sample', transposed=True, groupby=None, compute=True)[source]
Bases:
BaseTransformer
- Transformer to handle higher dimensional data, for instance data
sampled in time and location (‘x’,’y’,’time’), that must be stacked before running Featurizer, and unstacked after prediction.
- Parameters:
sample_dim (str) –
List (tuple) of the dimensions used to define how the data is sampled.
If your sample dim has multiple dimensions, for instance x,y,time these can be passed as a list or tuple. Before stacking, a new multiindex ‘sample’ will be created for these dimensions.
direction (str, optional) – “stack” or “unstack” defines the direction of transformation. Default is “stack”
sample_dim – Name of multiindex used to stack sample dims. Defaults to “sample”
transposed (bool) – Should the output be transposed after stacking. Default is True.
- Returns:
Xt – The transformed data.
- Return type:
xarray DataArray or Dataset
Methods
fit
(X[, y])Fit estimator to data.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform input data.
- xr_fresh.transformers.is_dataarray(X, require_attrs=None)[source]
Check whether an object is a DataArray.
- Parameters:
X (anything) – The object to be checked.
require_attrs (list of str, optional) – The attributes the object has to have in order to pass as a DataArray.
- Returns:
Whether the object is a DataArray or not.
- Return type:
bool
- xr_fresh.transformers.is_dataset(X, require_attrs=None)[source]
Check whether an object is a Dataset. :param X: The object to be checked. :type X: anything :param require_attrs: The attributes the object has to have in order to pass as a Dataset. :type require_attrs: list of str, optional
- Returns:
Whether the object is a Dataset or not.
- Return type:
bool
- xr_fresh.transformers.stackerizer(X, return_estimator=False, **fit_params)[source]
Stacks all dimensions and variables except for sample dimension.
- Parameters:
X (xarray DataArray or Dataset"") – The input data.
return_estimator (bool) – Whether to return the fitted estimator along with the transformed data.
- Returns:
Xt – The transformed data.
- Return type:
xarray DataArray or Dataset
xr_fresh.utils module
Created on Tue Jun 30 15:34:47 2020
@author: https://github.com/robintw/XArrayAndRasterio/blob/master/rasterio_to_xarray.py
- xr_fresh.utils.add_categorical(data, labels=None, col=None, variable_name=None, missing_value=-9999)[source]
Adds categorical data to xarray by column name.
Examples
climatecluster = ‘ ./ClusterEco15_Y5.shp’
- with gw.open(vrts,
time_names = [str(x) for x in range(len(vrts))], ) as ds:
ds.attrs[‘filename’] = vrts ds = add_categorical(ds, climatecluster,col=’ClusterN_2’,variable_name=’clim_clust’) print(ds)
- Parameters:
data (xarray.DataArray) – xarray to add categorical data to
labels (path or gpd.geodataframe or path to tif) – path or df to shapefile or raster with categorical data
col (str) – Column to create get values from
variable_name (str) – name assigned to categorical data
missing_value (int) – missing value for pixels not overlapping polygon or points
- xr_fresh.utils.add_time_targets(data, target, target_col_list=None, target_name='target', missing_value=-9999, append_to_X=False)[source]
Adds multiple time periods of target data to existing xarray obj.
Examples
- with gw.open(vrts, time_names = time_names, chunks=400) as ds:
- ds = add_time_targets( data = ds,
target= loss_poly, target_col_list = [‘w_dam_2010’,’w_dam_2011’,’w_dam_2012’,
‘w_dam_2013’,’w_dam_2014’,’w_dam_2015’,’w_dam_2016’],
target_name=’weather_damage’, missing_value=np.nan, append_to_X=True )
- Parameters:
data (xarray.DataArray) – xarray to add target data to
target (path or gpd.geodataframe) – path or df to shapefile with target data data
target_col_list (list) – list of columns holding target data All column names must be in acceding order e.g. [‘t_2010’,’t_2011’]
target_name (str) – single name assigned to target data dimension. Default is ‘target’
missing_value (int) – missing value for pixels not overlapping polygon or points
append_to_X (bool) – should the target data be appended to the far right of other X variables. Default is False.
- xr_fresh.utils.check_variable_lengths(variable_list)[source]
Check if a list of variable files are of equal length
- Parameters:
variable_list (list)
- Returns:
DESCRIPTION.
- Return type:
TYPE bool
- xr_fresh.utils.convert_to_min_dtype(arr)[source]
Convert a numpy array to the smallest data type possible :param arr: numpy array :type arr: np.array :return: numpy array with smallest data type :rtype: np.array
Examples
>>> arr = np.array([1, 2, 3, 4, 5]) >>> convert_to_min_dtype(arr) array([1, 2, 3, 4, 5], dtype=int8)
- xr_fresh.utils.downcast_pandas(data)[source]
Dtype cast to smallest numerical dtype possible for pandas dataframes. Saves considerable space. Objects are cast to categorical, int and float are cast to the smallest dtype
Note: could be problematic with chunks if different dtypes are assigned to same column
- Parameters:
data (DataFrame) – input dataframe
- Returns:
downcast dataframe
- Return type:
DataFrame
- xr_fresh.utils.find_variable_names(path_glob)[source]
Return all unique variables names from path glob, removing trailing date and __
Example: path_glob = f”{file_path}NDVI_MODIS/Meher_features/ndvi*.tif” find_variable_names(path_glob)
- Parameters:
path_glob (path) – path with * for file glob
- xr_fresh.utils.find_variable_year(path_glob, digits=4, strp_glob='%Y.tif')[source]
Return all unique variables 4 digit years years from path glob
Example: path_glob = f”{file_path}NDVI_MODIS/Meher_features/ndvi*.tif” find_variable_names(path_glob)
- Parameters:
path_glob (path) – path with * for file glob
digits (int) – number of digits used to store year
strp_glob (string) – strptime pattern with year format and file type
- xr_fresh.utils.to_vrt(data, filename, resampling=None, nodata=None, init_dest_nodata=True, warp_mem_limit=128)[source]
Writes a file to a VRT file :param data: The
xarray.DataArray
to write. :type data: DataArray :param filename: The output file name to write to. :type filename: str :param resampling: The resampling algorithm forrasterio.vrt.WarpedVRT
. Default is ‘nearest’. :type resampling: Optional[object] :param nodata: The ‘no data’ value forrasterio.vrt.WarpedVRT
. :type nodata: Optional[float or int] :param init_dest_nodata: Whether or not to initialize output tonodata
forrasterio.vrt.WarpedVRT
. :type init_dest_nodata: Optional[bool] :param warp_mem_limit: The GDAL memory limit forrasterio.vrt.WarpedVRT
. :type warp_mem_limit: Optional[int]Example
>>> import geowombat as gw >>> from rasterio.enums import Resampling >>> >>> # Transform a CRS and save to VRT >>> with gw.config.update(ref_crs=102033): >>> with gw.open('image.tif') as src: >>> gw.to_vrt(src, >>> 'output.vrt', >>> resampling=Resampling.cubic, >>> warp_mem_limit=256) >>> >>> # Load multiple files set to a common geographic extent >>> bounds = (left, bottom, right, top) >>> with gw.config.update(ref_bounds=bounds): >>> with gw.open(['image1.tif', 'image2.tif'], mosaic=True) as src: >>> gw.to_vrt(src, 'output.vrt')
- xr_fresh.utils.xarray_to_rasterio(xr_data, path='', postfix='', bands=None)[source]
Writes xarray bands to disk by band
Examples
>>> f_dict = { 'maximum':[{}] , 'quantile': [{'q':"0.5"},{'q':'0.95'}]} >>> features = extract_features(xr_data=ds, >>> feature_dict=f_dict, >>> band='aet', >>> na_rm = True) >>> xarray_to_rasterio(features,'/home/mmann1123/Desktop/', postfix='test')
- Parameters:
xr_data (xarray.DataArray) – xarray to write
path (str) – file destination path
output_postfix (list) – text to append to back of written image
output_postfix – list of character strings or locations of band names, if None all bands are written
xr_fresh.visualizer module
- xr_fresh.visualizer.plot_interpolated_actual(interpolated_stack: str, original_image_list: list, samples: int = 20)[source]
Plots the interpolated and actual values for a given time series.
- Parameters:
interpolated_stack (str) – multiband stack of images representing interpolated time series. Defaults to None.
original_image_list (list) – list of files used in interpolation. Defaults to None.
samples (int, optional) – number of random points to compare time series. Defaults to 20.