Climate Adaptation Digital Twin: average temperature in Germany and Heating Degree Days in Darmstadt¶
This notebook will provide you guidance on how to access and use Climate Adaptation Digital Twin (Climate DT) datasets on Earth Data Hub (EDH). Access to these datasets is granted via Destination Earth's Standard API keys, but restricted to users with Upgraded Access.
In this tutorial we will learn how to use Xarray to access the Climate DT standard resolution hourly data on single-level or surface, IFS-NEMO, SSP3-7.0 dataset on EDH. However, you can easily replicate the procedure for any other dataset belonging to the same collection.
If you are running this tutorial in your own environment, you can download the convenience modules costing.py and display.py from DESP-UserWorkflowService-Templates/EarthDataHub
Goal of this tutorial¶
The first goal of this tutorial is to plot the average 2 metre temperature in Germany for each year between 2015 and 2049.
The second goal of this tutorial is to calculate the Heating Degree Days in Darmstadt in the same years.
What you will learn:¶
- how to access the dataset
- select and reduce the data
- plot the results
Working with EDH data¶
Datasets on EDH are typically very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment.
The structure of a workflow that uses EDH data tipically looks like this:
- data access
- data selection
- (optional) data reduction
- data download
- further operations and visualization
Xarray and Dask work together following a lazy principle. This means that when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed.
A smart user will first reduce the amount of data that needs to be downloaded and explicitly call compute()
on it. Once the compute()
operation is complete the data is loaded into memory and available for subsequent fast processing.
1. Data access¶
# Optional: ignore warnings
import warnings
warnings.filterwarnings('ignore')
To access any data on EDH, you need to obtain an API key. To obtain an API key you need to register to the DestinE platform. Then, you can go to Earth Data Hub Account Settings - Quota & API keys where you can find out your default API key or create others.
Finally, you need to instruct your tools to use your API key when downloading the data. One way to do it, is to incorporate the API key into the dataset's url. If you wish to follow this method, cut and paste your personal API key below: ⤵
EDH_KEY = "your_EDH_API_key"
#e.g. EDH_KEY="edh_key_MHOQO_Y_h4ywpaMQe4cndtydvbpU6ySBNARWprm5tgz"
url = f"https://edh:{EDH_KEY}@api.earthdatahub.destine.eu/climate-dt-2/IFS-NEMO-SSP3-7.0-sfc-hourly-standard-v0.zarr"
There are more ways in which you can provide your API key to Xarray, especially if you plan to use the system frequently. Please refer to https://dev.earthdatahub.bopen.eu/getting-started for a complete overview.
We are now ready to access the data with Xarray:
import xarray as xr
ds = xr.open_dataset(
url,
chunks={},
engine="zarr",
zarr_format=3,
)
ds
<xarray.Dataset> Size: 22TB
Dimensions: (time: 306816, latitude: 512, longitude: 1025)
Coordinates:
* time (time) datetime64[ns] 2MB 2015-01-01 ... 2049-12-31T23:00:00
* latitude (latitude) float64 4kB -90.0 -89.65 -89.3 ... 89.3 89.65 90.0
* longitude (longitude) float64 8kB -180.0 -179.6 -179.3 ... 179.6 180.0
Data variables: (12/36)
avg_ie (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
avg_iews (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
avg_inss (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
avg_ishf (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
avg_sdlwrf (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
avg_sdswrf (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
... ...
tciw (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
tclw (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
tcw (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
tcwv (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
u10 (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
v10 (time, latitude, longitude) float32 644GB dask.array<chunksize=(1440, 64, 64), meta=np.ndarray>
Attributes:
GRIB_edition: 2
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 1003
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts⚠ At this point, no data has been downloaded yet, nor loaded in memory.
2. Data selection¶
We first select the 2 metres temperature.
xr.set_options(keep_attrs=True)
t2m = ds.t2m - 273.15
t2m.attrs["units"] = "°C"
t2m
<xarray.DataArray 't2m' (time: 306816, latitude: 512, longitude: 1025)> Size: 644GB
dask.array<sub, shape=(306816, 512, 1025), dtype=float32, chunksize=(1440, 64, 64), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2MB 2015-01-01 ... 2049-12-31T23:00:00
* latitude (latitude) float64 4kB -90.0 -89.65 -89.3 ... 89.3 89.65 90.0
* longitude (longitude) float64 8kB -180.0 -179.6 -179.3 ... 179.6 180.0
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 306815As xarray showes, the dimension of the t2m DataArray is around 600GiB (in memory). We will now try to narrow down the selection as much as possible.
We select only the Germany area:
germany = {'latitude': slice(46, 56), 'longitude': slice(5, 16)}
t2m_germany = t2m.sel(**germany)
t2m_germany
<xarray.DataArray 't2m' (time: 306816, latitude: 28, longitude: 31)> Size: 1GB
dask.array<getitem, shape=(306816, 28, 31), dtype=float32, chunksize=(1440, 28, 31), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2MB 2015-01-01 ... 2049-12-31T23:00:00
* latitude (latitude) float64 224B 46.32 46.67 47.03 ... 55.13 55.48 55.83
* longitude (longitude) float64 248B 5.273 5.625 5.977 ... 15.12 15.47 15.82
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 306815Notice that the size of the array dropped down to around 1GiB. However, due to the chunked structure of the DataArray, xarray must download every chunk that includes a portion of the selected data.
To estimate the size of the download, we can use the costing.py module. This estimate must be done before we apply any reduction operation.
import costing
costing.estimate_download_size(t2m, t2m_germany)
estimated_needed_chunks: 214 estimated_download_size: 0.47 GiB (482.0 MiB)
3. Data reduction¶
We average the 2 metres temperature by year. This also is a lazy operation.
t2m_germany_yearly_mean = t2m_germany.resample(time='YE').mean(dim="time")
t2m_germany_yearly_mean
<xarray.DataArray 't2m' (time: 35, latitude: 28, longitude: 31)> Size: 122kB
dask.array<transpose, shape=(35, 28, 31), dtype=float32, chunksize=(1, 28, 31), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 280B 2015-12-31 2016-12-31 ... 2049-12-31
* latitude (latitude) float64 224B 46.32 46.67 47.03 ... 55.13 55.48 55.83
* longitude (longitude) float64 248B 5.273 5.625 5.977 ... 15.12 15.47 15.82
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 3068154. Data download¶
This is the phase where we explicitly trigger the download of the data. Remember to assign the return of the compute() function to a new variable, so that the data is kept in memory.
%%time
t2m_germany_yearly_mean_computed = t2m_germany_yearly_mean.compute()
CPU times: user 27.8 s, sys: 6.88 s, total: 34.6 s Wall time: 1min 15s
5. Visualization¶
We will now create and display an animation of the average winter 2 metres temperature in Germany, for the years 2020-2039.
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy import crs
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from mpl_toolkits.axes_grid1 import make_axes_locatable
fig, ax = plt.subplots(subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linestyle=':', linewidth=0.5)
ax.add_feature(cfeature.OCEAN, facecolor='lightblue', zorder=2)
ax.gridlines(draw_labels=True, zorder=3, color="white", alpha=0.5)
t2m_germany_yearly_mean_computed.isel(time=0).plot(ax=ax, transform=ccrs.PlateCarree(), cmap='RdBu_r', add_colorbar=True, cbar_kwargs={'orientation': 'vertical', 'shrink':0.9,'pad': 0.15})
# function to update the plot for each frame (each timestep)
def update(frame):
data = t2m_germany_yearly_mean_computed.isel(time=frame)
plot = data.plot(
ax=ax,
transform=ccrs.PlateCarree(),
cmap='RdBu_r',
vmin=-25,
vmax=25,
add_colorbar=False
)
ax.set_title(f"Time: {pd.Timestamp(data['time'].values).strftime('%Y-%m-%d')}")
return plot
anim = FuncAnimation(fig, update, frames=len(t2m_germany_yearly_mean_computed['time']), repeat=True) # Create the animation
plt.close() # close the static plot to avoid duplicate display
HTML(anim.to_jshtml()) # display the animation in the notebook
Heating Degree Days (HDD) in Darmstadt¶
Let us now investigate the Heating Degree Days in Darmstadt. We start again from the 2 metres temperature.
t2m
<xarray.DataArray 't2m' (time: 306816, latitude: 512, longitude: 1025)> Size: 644GB
dask.array<sub, shape=(306816, 512, 1025), dtype=float32, chunksize=(1440, 64, 64), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2MB 2015-01-01 ... 2049-12-31T23:00:00
* latitude (latitude) float64 4kB -90.0 -89.65 -89.3 ... 89.3 89.65 90.0
* longitude (longitude) float64 8kB -180.0 -179.6 -179.3 ... 179.6 180.0
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 306815We narrow down the selection to the data that is closer to Darmstadt.
darmstadt = {"latitude": 49.88, "longitude": 8.65}
base_temperature = 15 #[°C]
t2m_darmstadt = t2m.sel(darmstadt, method="nearest")
t2m_darmstadt
<xarray.DataArray 't2m' (time: 306816)> Size: 1MB
dask.array<getitem, shape=(306816,), dtype=float32, chunksize=(1440,), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2MB 2015-01-01 ... 2049-12-31T23:00:00
latitude float64 8B 49.84
longitude float64 8B 8.789
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 306815We estimate the cost of the download with the costing.py module.
costing.estimate_download_size(t2m, t2m_darmstadt)
estimated_needed_chunks: 214 estimated_download_size: 0.47 GiB (482.0 MiB)
We compute the HDD in Darmstadt with a very simplyfied formula:
t2m_darmstadt_daily_mean = t2m_darmstadt.resample(time='1D').mean(dim='time')
diff = (base_temperature - t2m_darmstadt_daily_mean)
hdd = diff.where(diff > 0).groupby("time.year").sum()
We explicitly trigger the download of the data. Remember to assign the return of the compute() function to a new variable, so that the data is kept in memory.
%%time
hdd_computed = hdd.compute()
CPU times: user 18.6 s, sys: 4.14 s, total: 22.8 s Wall time: 41.1 s
We can finally visualize the HDD in Darmstadt.
hdd
<xarray.DataArray 't2m' (year: 35)> Size: 140B
dask.array<groupby_nansum, shape=(35,), dtype=float32, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
* year (year) int64 280B 2015 2016 2017 2018 ... 2046 2047 2048 2049
latitude float64 8B 49.84
longitude float64 8B 8.789
Attributes: (12/20)
GRIB_paramId: 167
GRIB_dataType: fc
GRIB_numberOfPoints: 196608
GRIB_typeOfLevel: heightAboveGround
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: 2t
GRIB_units: K
long_name: 2 metre temperature
units: °C
standard_name: air_temperature
last_ingested_time_index: 306815plt.style.use("seaborn-v0_8-darkgrid")
fig, ax = plt.subplots(figsize=(12, 5))
plt.bar(hdd_computed.year, hdd.values, color='#ff0000', alpha=0.7)
plt.xlabel('time')
plt.ylabel('HDD [°C]')
plt.grid(axis='y', alpha=0.75)
plt.title('Heating Degrees Days in Darmstadt')
plt.xticks(hdd.year[::2], rotation=45);