Description
Xrspatial not working with large dask backed arrays
I believe there's a bug in the xrspatial.proximity method whereby the coordinates for a large array (even if its a dask array) are loaded into memory as a numpy array instead of there being a dask array. For context I'm attempting to compute a distance to coast for a 90m global dataset. I had a look into the source code and I believe the issue is here (in the _process function of proximity):
`
raster_dims = raster.dims
if raster_dims != (y, x):
raise ValueError(
"raster.coords should be named as coordinates:"
"({0}, {1})".format(y, x)
)
distance_metric = DISTANCE_METRICS.get(distance_metric, None)
if distance_metric is None:
distance_metric = DISTANCE_METRICS["EUCLIDEAN"]
target_values = np.asarray(target_values)
# x-y coordinates of each pixel.
# flatten the coords of input raster and reshape to 2d
xs = np.tile(raster[x].data, raster.shape[0]).reshape(raster.shape)
ys = np.repeat(raster[y].data, raster.shape[1]).reshape(raster.shape)
`
Therefore XS and YS are huge numpy arrays that don't fit into memory, whereas, if the input data is a dask array these should probably be dask arrays rather than numpy arrays. Later on in the processing sequence the proximity calculation is either done using dask or numpy and I think there should be a similar thing here for dask/numpy processing.
If I've missed something please let me know, more than happy to share some more code if needs be and in the mean time I can use gdal_proximity.py directly but I guess it would be slower than using the dask backed xrspatial.
Thanks!