How to work with the Copernicus DEM data on Earth Data Hub¶
This notebook will provide you guidance on how to access and use the https://data.earthdatahub.destine.eu/copernicus-dem/GLO-30-v0.zarr dataset on Earth Data Hub (EDH). The Copernicus DEM is a Digital Surface Model (DSM) representing the surface of the Earth including buildings, infrastructure and vegetation.
Goal of this tutorial¶
The goal of this tutorial is to visualize the copernicus DEM for the European area.
What you will learn:¶
- how to access the data
- select and reduce the data
- plot the results
To access datasets on Earth Data Hub you need to instruct your tools (xarray, Zarr, etc.) to use EDH personal access token when downloading the data.
To obtain a personal access token you first need to register to the Destination Earth platform. Then you can go to Earth Data Hub account settings where you can find your default personal access token or create others. After retrieving your personal access token, please cut and paste it below: ⤵
PAT = "your_personal_access_token"
# e.g. PAT="edh_pat_44bbb7e9192a4c6bb47ddf07d07564eee5d17de8dfc48f7118f88e3bc4a4157f8fe2403f5aa0a2d53441b6922ea9a33a"
Working with EDH data¶
Datasets on EDH are often very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment.
The structure of a workflow that uses EDH data tipically looks like this:
- data access
- data selection
- (optional) data reduction
- data download
- further operations and visualization
Xarray and Dask work together following a lazy principle. This means that when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed.
A smart user will first reduce the amount of data that needs to be downloaded and explicitly call compute()
on it. Once the compute()
operation is complete the data is loaded into memory and available for subsequent fast processing.
1. Data access¶
To access the data, only the dataset metadata must be downloaded. Xarray does this automatically when you access a Zarr dataset:
import xarray as xr
ds = xr.open_dataset(
f"https://edh:{PAT}@data.earthdatahub.destine.eu/copernicus-dem/GLO-30-v0.zarr",
chunks={},
engine="zarr",
decode_coords="all",
mask_and_scale=False,
)
ds
<xarray.Dataset> Size: 3TB Dimensions: (lat: 648000, lon: 1296001) Coordinates: * lat (lat) float64 5MB -90.0 -90.0 -90.0 -90.0 ... 90.0 90.0 90.0 * lon (lon) float64 10MB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0 spatial_ref int64 8B ... Data variables: dsm (lat, lon) float32 3TB dask.array<chunksize=(3600, 3600), meta=np.ndarray>
âš At this point, no data has been downloaded yet, nor loaded in memory.
2. Data selection¶
From the Copernicus DEM dataset we select the dsm (digital surface model) variable, and perform a geographical selection corresponding to the European area:
dsm = ds.dsm
europe = dsm.sel(lat=slice(36, 71), lon=slice(-10, 41))
europe
<xarray.DataArray 'dsm' (lat: 126001, lon: 183601)> Size: 93GB dask.array<getitem, shape=(126001, 183601), dtype=float32, chunksize=(3600, 3600), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 1MB 36.0 36.0 36.0 36.0 ... 71.0 71.0 71.0 71.0 * lon (lon) float64 1MB -10.0 -10.0 -9.999 -9.999 ... 41.0 41.0 41.0 spatial_ref int64 8B ... Attributes: long_name: height above geoid standard_name: altitude units: m _FillValue: 0.0
This greatly reduces the amount of data that will be downloaded from Earth Data Hub. However, due to the chunked structure of the DataArray, xarray must still download every chunk that includes a portion of the selected data.
To estimate the size of the download, we can use the costing.py
module. This estimate must be done before we apply any reduction operation.
import costing
costing.estimate_download_size(dsm, reduced_europe)
estimated_needed_chunks: 1836 estimated_memory_size: 95.178 GB estimated_download_size: 9.518 GB
3. Data reduction¶
Copernicus DEM on Earth Data Hub is a high resoluton regular latitide-longitude grid dataset. For the scope or this tutorial, we do not need its full precision. We can downscale the dataset resolution by selecting only one grid point out of 1000:
reduced_europe = europe[::1000, ::1000]
reduced_europe
<xarray.DataArray 'dsm' (lat: 127, lon: 184)> Size: 93kB dask.array<getitem, shape=(127, 184), dtype=float32, chunksize=(4, 4), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 1kB 36.0 36.28 36.56 36.83 ... 70.44 70.72 71.0 * lon (lon) float64 1kB -10.0 -9.722 -9.444 ... 40.28 40.56 40.83 spatial_ref int64 8B ... Attributes: long_name: height above geoid standard_name: altitude units: m _FillValue: 0.0
4. Data download¶
This is the phase where we explicitly trigger the download of the data. Remember to assign the return of the compute()
function to a new variable, so that the data is kept in memory.
%%time
reduced_europe_computed = reduced_europe.compute()
CPU times: user 3min 17s, sys: 1min 41s, total: 4min 58s Wall time: 3min 46s
3. Visualization¶
Now that the data has been loaded in memory, we can visualize the height above geoid for continental Europe. We also add borders, coastlines and ocean. The ocean feature will help us to to distinguish zero quote land (or lower) from the sea.
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
lon = reduced_europe_computed.lon
lat = reduced_europe_computed.lat
# Create the figure and axis with Mercator projection
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={"projection": ccrs.AlbersEqualArea()})
# Add map features (coastlines, borders, etc.)
ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linestyle=':', linewidth=0.5)
ax.add_feature(cfeature.OCEAN, facecolor='lightblue', zorder=2)
# Set the extent of the map
ax.set_extent([min(lon), max(lon)-13, min(lat), max(lat)], crs=ccrs.PlateCarree())
contourf = ax.contourf(lon, lat, reduced_europe_computed, levels=100, cmap='terrain', transform=ccrs.PlateCarree(), vmin=-2000, vmax=4000)
# Add colorbar
cbar = plt.colorbar(contourf, ax=ax, orientation='vertical', pad=0.07)
cbar.set_label('Altitude (m)') # Label for the colorbar
# Add gridlines for lat/lon
ax.gridlines(draw_labels=True, zorder=3)
# Show the plot
plt.show()