pyresample.bucket package

Module contents

Code for resampling using bucket resampling.

class pyresample.bucket.BucketResampler(target_area, source_lons, source_lats)

Bases: object

Bucket resampler.

Bucket resampling is useful for calculating averages and hit-counts when aggregating data to coarser scale grids.

Below are examples how to use the resampler.

Read data using Satpy. The resampling can also be done (apart from fractions) directly from Satpy, but this demonstrates the direct low-level usage.

>>> from pyresample.bucket import BucketResampler
>>> from satpy import Scene
>>> from satpy.resample import get_area_def
>>> fname = "hrpt_noaa19_20170519_1214_42635.l1b"
>>> glbl = Scene(filenames=[fname])
>>> glbl.load(['4'])
>>> data = glbl['4']
>>> lons, lats = data.area.get_lonlats()
>>> target_area = get_area_def('euro4')

Initialize the resampler

>>> resampler = BucketResampler(target_area, lons, lats)

Calculate the sum of all the data in each grid location:

>>> sums = resampler.get_sum(data)

Calculate how many values were collected at each grid location:

>>> counts = resampler.get_count()

The average can be calculated from the above two results, or directly using the helper method:

>>> average = resampler.get_average(data)

Calculate fractions of occurrences of different values in each grid location. The data needs to be categorical (in integers), so we’ll create some categorical data from the brightness temperature data that were read earlier. The data are returned in a dictionary with the categories as keys.

>>> data = da.where(data > 250, 1, 0)
>>> fractions = resampler.get_fractions(data, categories=[0, 1])
>>> import matplotlib.pyplot as plt
>>> plt.imshow(fractions[0]); plt.show()

__init__(target_area, source_lons, source_lats)

_call_bin_statistic(statistic_method, data, fill_value=None, skipna=None): Calculate statistics (min/max) for each bin with drop-in-a-bucket resampling.

static _get_abs_max_from_min_max(min_, max_): From array of min and array of max, get array of abs max.

_get_indices()

Calculate projection indices.

Returns:

x_idxs (Dask array) – X indices of the target grid where the data are put
y_idxs (Dask array) – Y indices of the target grid where the data are put

_get_proj_coordinates(lons, lats)

Calculate projection coordinates.

Parameters:

lons (Numpy or Dask array) – Longitude coordinates
lats (Numpy or Dask array) – Latitude coordinates

_mask_bins_with_nan_if_not_skipna(skipna, data, out_size, statistic, fill_value)

get_abs_max(data, fill_value=nan, skipna=True)

Calculate absolute maximums for each bin with drop-in-a-bucket resampling.

Returns for each bin the original signed value which has the largest absolute value.

Warning

The slow pandas.DataFrame.groupby() method is temporarily used here, as the dask_groupby is still under development.

Parameters:

data (Numpy or Dask array) – Data to be binned.
fill_value (number (optional)) – Value to use for empty buckets or all-NaN buckets.
skipna (boolean (optional)) – If True, skips NaN values for the maximum calculation (similarly to Numpy’s nanmax). Buckets containing only NaN are set to fill value. If False, sets the bucket to NaN if one or more NaN values are present in the bucket (similarly to Numpy’s max). In both cases, empty buckets are set to fill value. Default: True

Returns:

data – Bin-wise maximums in the target grid

Return type:

Numpy or Dask array

get_average(data, fill_value=nan, skipna=True)

Calculate bin-averages using bucket resampling.

Parameters:

data (Numpy or Dask array) – Data to be binned and averaged.
fill_value (float) – Fill value to mark missing/invalid values in the input data, as well as in the binned and averaged output data. Default: np.nan
skipna (bool) – If True, skips missing values (as marked by NaN or fill_value) for the average calculation (similarly to Numpy’s nanmean). Buckets containing only missing values are set to fill_value. If False, sets the bucket to fill_value if one or more missing values are present in the bucket (similarly to Numpy’s mean). In both cases, empty buckets are set to NaN. Default: True

Returns:

average – Binned and averaged data.

Return type:

Dask array

get_count()

Count the number of occurrences for each bin using drop-in-a-bucket resampling.

Returns:: data – Bin-wise count of hits for each target grid location
Return type:: Dask array

get_fractions(data, categories=None, fill_value=nan)

Get fraction of occurrences for each given categorical value.

Parameters:

data (Numpy or Dask array) – Categorical data to be processed
categories (iterable or None) – One dimensional list of categories in the data, or None. If None, categories are determined from the data by fully processing the data and finding the unique category values.
fill_value (float) – Fill value to replace missing values. Default: np.nan

get_max(data, fill_value=nan, skipna=True)

Calculate maximums for each bin with drop-in-a-bucket resampling.

Parameters:

data (Numpy or Dask array) – Data to be binned.
skipna (boolean (optional)) – If True, skips NaN values for the maximum calculation (similarly to Numpy’s nanmax). Buckets containing only NaN are set to zero. If False, sets the bucket to NaN if one or more NaN values are present in the bucket (similarly to Numpy’s max). In both cases, empty buckets are set to 0. Default: True

Returns:

data – Bin-wise maximums in the target grid

Return type:

Numpy or Dask array

get_min(data, fill_value=nan, skipna=True)

Calculate minimums for each bin with drop-in-a-bucket resampling.

Parameters:

data (Numpy or Dask array) – Data to be binned.
skipna (boolean (optional)) – If True, skips NaN values for the minimum calculation (similarly to Numpy’s nanmin). Buckets containing only NaN are set to zero. If False, sets the bucket to NaN if one or more NaN values are present in the bucket (similarly to Numpy’s min). In both cases, empty buckets are set to 0. Default: True

Returns:

data – Bin-wise minimums in the target grid

Return type:

Numpy or Dask array

get_sum(data, fill_value=nan, skipna=True, empty_bucket_value=0)

Calculate sums for each bin with drop-in-a-bucket resampling.

Parameters:

data (Numpy or Dask array) – Data to be binned and summed.
fill_value (float) – Fill value of the input data marking missing/invalid values. Default: np.nan
skipna (boolean (optional)) – If True, skips missing values (as marked by NaN or fill_value) for the sum calculation (similarly to Numpy’s nansum). Buckets containing only missing values are set to empty_bucket_value. If False, sets the bucket to fill_value if one or more missing values are present in the bucket (similarly to Numpy’s sum). In both cases, empty buckets are set to empty_bucket_value. Default: True
empty_bucket_value (float) – Set empty buckets to the given value. Empty buckets are considered as the buckets with value 0. Note that a bucket could become 0 as the result of a sum of positive and negative values. If the user needs to identify these zero-buckets reliably, get_count() can be used for this purpose. Default: 0

Returns:

data – Bin-wise sums in the target grid

Return type:

Numpy or Dask array

pyresample.bucket._expand_bin_statistics(bins, unique_bin, unique_idx, weights_sorted): Expand bin statistics to cover all bins.

pyresample.bucket._find_unique_bins_and_indices(bins, idxs_sorted): Get unique bins and corresponding indices.

pyresample.bucket._get_bin_statistic(bins, idxs_sorted, weights_sorted): Get the statistic of each bin.

pyresample.bucket._get_invalid_mask(data, fill_value): Get a boolean array where values equal to fill_value in data are True.

pyresample.bucket._get_sorted_indices_and_data(statistic_method, data, idxs)

pyresample.bucket._sort_weights(statistic_method, weights)

Sort idxs and weights based on weights.

By default the method for sorting is ‘min’.

pyresample.bucket.round_to_resolution(arr, resolution)

Round the values in arr to closest resolution element.

Parameters:

arr (list, tuple, Numpy or Dask array) – Array to be rounded
resolution (float) – Resolution unit to which data are rounded

Returns:

data – Source data rounded to the closest resolution unit

Return type:

Numpy or Dask array