Resampling

In the general sense, resampling is the process of creating new data points from data we already have. In Pyresample, we use the term “resample” when we use the data from one geometry to create data points in another geometry. With the available geometries in mind we typically use resampling in a few common use cases:

swath to area: Or in another way, resampling ungridded data to a grid. This is useful for many reasons. Most areas are easy to visualize and provide a view of the Earth that is somewhat recognizable. This isn’t always true of swaths. This resampling has all the other benefits of areas too such as being easier to describe the geolocation and easier to compare with other data. This type of resampling also allows getting a subset of the swath or changing the spatial pixel resolution to something more suited for your use case.
area/swath to swath: When combining or comparing data from multiple sources and both of them happen to be ungridded swaths, resampling can be used to bring them on to the same coordinate system.
area to area: Similar to resampling a swath to an area, we may want to get a subset of the input data or change the resolution so that it suits our needs. There is also the opportunity to change projections or to match the area for another dataset for easier comparison.

In all of these cases we need some way to determine what value to set for each output pixel given one or more input pixels. For this there are many algorithms to choose from. The algorithms implemented in Pyresample are described below.

For more information on how to do resampling with Pyresample see some of the How-Tos such as Resampling of swath data for swath data or Resampling of gridded data for areas.

Algorithms

When resampling we have a lot of options for mapping input pixels to an output pixels. Depending on the algorithm we choose we may get better looking results than others, but at the cost of more processing time or larger memory requirements. Below is a basic description of some of the algorithms implemented in Pyresample, but for a more detailed description see the API documentation where the low-level resampler classes and functions are documented.

Note that unless otherwise stated, the algorithms implemented in Pyresample consider all pixels as individual points located at their centers. This means they may generally ignore the shape of the real world footprint and possible overlap of these footprints.

Warning

Pyresample’s resampling interfaces are currently undergoing changes as part of the version 2.0 redesign. As such, the existing algorithms described below may only be available from specific interfaces and may or may not be consistent with the interfaces of other algorithms. Some algorithms, at the time of writing, may only support data types needed to work with Pyresample’s sibling project “Satpy”. For example, some may expect Xarray DataArray classes with dask arrays underneath and will fail in ugly ways if given regular numpy arrays.

Nearest Neighbor

Nearest neighbor is one of the simplest ways of mapping input pixels to output pixels. It sets every output pixel to the value of the input pixel that is geographically closest to the output pixel’s location. In Pyresample this is implemented using a k-d tree via the pykdtree package. This structure is created using the input geolocation and allows for quick querying for the index of the nearest input pixel. Using these indexes we can create an output array the size of the target geometry using a basic numpy indexing operation. This is normally a very fast operation.

Any pixel in the output array that doesn’t have a neighboring input pixel will be assigned some fill value, typically the special floating point NaN. The algorithm decides how far to look by a “radius of influence”. Regardless of the radius of influence, the result will always be the nearest neighboring pixel if there is one within the radius, otherwise the fill value will be used.

One good part about this algorithm and its simplicity is that it works with any pair of input and output geometries (swath, area, etc) whether the pixels are topology preserving or not. This is not necessarily true for all algorithms. This is the oldest algorithm implemented in pyresample and has been adapted in different interfaces to support numpy, dask, and xarray DataArrays.

Caching

Due to the way this algorithm uses the k-d tree and the indexes generated by querying it, the indexes can be easily cached. Since the kd-tree only uses input geolocation for creation and output geolocation for querying, we can reuse the generated indexes for any input data sharing the same geolocation. This can increase performance greatly by not having to regenerate or requery the k-d tree. If our swath or area-based input is the same between executions we can also write these indexes to disk and use them in the next execution without having to generate or query a k-d tree.

Gradient Search

The gradient search algorithm is based on work by Alexander Trishchenko (see DOI:10.1109/TGRS.2008.916633). The algorithm benefits from its assumption that an array representation preserves the topology of the observed data. By only looking at nearby pixels the algorithm can efficiently compute a nearest neighbor or perform a bilinear interpolation.

For the interfaces that exist (see warning below) and with the data that meet the limitations of this algorithm (topology preserving, etc), this algorithm should be faster and more memory efficient for nearest neighbor than the k-d tree approach (see above) and the other bilinear algorithm implemented in Pyresample (see below). The algorithm does not currently support caching, but also doesn’t need to due to its speed.

Warning

This resampling algorithm is still considered experimental. At the time of writing it only supports area to area resampling and requires xarray DataArray objects backed by dask arrays.

Bilinear

Pyresample also offers a standalone bilinear algorithm that existed before gradient search. It is based on the same k-d tree as the nearest neighbor algorithm described above. Due to its use of the k-d tree it is able to handle arrays that do not preserve the geographic topology of the data. It is currently limited to xarray DataArray with dask arrays as inputs. The current implementation currently requires getting multiple nearby neighbors for every output pixel and then doing a bilinear interpolation between the four nearest surrounding pixels. This typically uses a lot of CPU and memory. For topology preserving data, it is recommended to use the gradient search algorithm.

Bucket

The bucket resampling algorithm is actually multiple algorithms following a similar structure. Bucket resampling is used to compute various types of statistics about the input data falling within an output pixel (the “bucket”). These statistics include sum, min, max, count, average, and fraction of each category for integer category data. Due to the possible differences between geometries (ex. projections, pixel resolution, etc), an output pixel might overlap with zero or more input pixels. Instead of only getting the nearest input pixel (nearest neighbor), bucket resampling allows us to get the maximum input value, or the minimum, or the average, or any other implementated calculation. This allows for more control over the final output that may be more useful or accurate depending on the type of data being worked with.

The current implementation is limited to xarray DataArrays and dask arrays.

Elliptical Weighted Averaging

Pyresample makes it possible to resample swath data to a uniform grid using an Elliptical Weighted Averaging algorithm or EWA for short. This algorithm behaves differently than the KDTree based resampling algorithms. The KDTree-based algorithms process each output grid pixel by searching for all “nearby” input pixels and applying a certain interpolation (nearest neighbor, gaussian, etc). The EWA algorithm processes each input pixel mapping it to one or more output pixels. Once each input pixel has been analyzed, the intermediate results are averaged to produce the final gridded result.

The EWA algorithm also has limitations on how the input data are structured compared to the generic KDTree algorithms. EWA assumes that data in the array is organized geographically; adjacent data in the array is adjacent data geographically. The algorithm uses this to configure parameters based on the size and location of the swath pixels. It also assumes that data are scan-based, recorded by a orbiting satellite scan by scan, and the user must provide scan size with the rows_per_scan option.

The EWA algorithm consists of two steps: ll2cr and fornav. The algorithm was originally part of the MODIS Swath to Grid Toolbox (ms2gt) created by the NASA National Snow & Ice Data Center (NSIDC). Its default parameters work best with MODIS L1B data, but it has been proven to produce high quality images from VIIRS and AVHRR data with the right parameters.

There are multiple high-level interfaces to this algorithm in order to support for numpy arrays or xarray DataArrays backed by dask arrays.