Climate Adaptation Digital Twin: winter temperature in Germany and Heating Degree Days in Darmstadt¶
This notebook will provide you guidance on how to access and use the https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-sfc-v0.zarr
dataset. Access to this dataset is restricted to authorized user only via the the Data Cache Management service.
To access the Data Cache Management service you need to instruct your tools (e.g. Xarray, Zarr...) to do so with the approprieate access token. This can be done by running the code snippets below.
First, make sure you have an account on the Destination Earth platform. Then, run the following cell, filling in your Destination Earth credentials and password when asked:
%%capture cap
%run ../cacheb/cacheb-authentication.py
from pathlib import Path
with open(Path.home() / ".netrc", "a") as fp:
fp.write(cap.stdout)
⚠ NOTE: the generated password is valid for a limited period of time, and needs to be regenerated and reconfigured periodically by running the cells above.
Goal of this tutorial¶
The first goal of this tutorial is to plot the average 2 metre temperature in Germany for years 2020-2039.
The second goal of this tutorial is to calculate the Heating Degree Days in Darmstadt in the same years.
What you will learn:¶
- how to access the dataset
- select and reduce the data
- plot the results
Working with EDH data¶
Datasets on EDH are typically very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment.
The structure of a workflow that uses EDH data tipically looks like this:
- data access
- data selection
- (optional) data reduction
- data download
- further operations and visualization
Xarray and Dask work together following a lazy principle. This means that when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed.
A smart user will first reduce the amount of data that needs to be downloaded and explicitly call compute()
on it. Once the compute()
operation is complete the data is loaded into memory and available for subsequent fast processing.
1. Data access¶
import xarray as xr
url = "https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-sfc-v0.zarr"
ds = xr.open_dataset(
url,
chunks={},
engine="zarr",
storage_options={"client_kwargs": {"trust_env": True}}
)
ds
<xarray.Dataset> Size: 188TB Dimensions: (time: 175320, latitude: 4096, longitude: 8193) Coordinates: * latitude (latitude) float64 33kB -90.0 -89.96 -89.91 ... 89.91 89.96 90.0 * longitude (longitude) float64 66kB -180.0 -180.0 -179.9 ... 180.0 180.0 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 1MB 2020-01-01 ... 2039-12-31T23:00:00 Data variables: d2m (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> sd (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> ssr (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> str (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> t2m (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> tprate (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> u10 (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> v10 (time, latitude, longitude) float32 24TB dask.array<chunksize=(48, 512, 512), meta=np.ndarray> Attributes: Conventions: CF-1.7 GRIB_centre: ecmf GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts GRIB_edition: 2 GRIB_subCentre: 1003 history: 2024-06-06T16:50 GRIB to CDM+CF via cfgrib-0.9.1... institution: European Centre for Medium-Range Weather Forecasts
⚠ At this point, no data has been downloaded yet, nor loaded in memory.
2. Data selection¶
We first select the 2 metres temperature.
xr.set_options(keep_attrs=True)
t2m = ds.t2m - 273.15
t2m.attrs["units"] = "°C"
t2m
<xarray.DataArray 't2m' (time: 175320, latitude: 4096, longitude: 8193)> Size: 24TB dask.array<sub, shape=(175320, 4096, 8193), dtype=float32, chunksize=(48, 512, 512), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 33kB -90.0 -89.96 -89.91 ... 89.91 89.96 90.0 * longitude (longitude) float64 66kB -180.0 -180.0 -179.9 ... 180.0 180.0 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 1MB 2020-01-01 ... 2039-12-31T23:00:00 Attributes: (12/19) GRIB_NV: 0 GRIB_cfName: air_temperature GRIB_cfVarName: t2m GRIB_dataType: fc GRIB_gridDefinitionDescription: 150 GRIB_gridType: healpix ... ... GRIB_typeOfLevel: heightAboveGround GRIB_units: K last_restart_dim_updated: 175320 long_name: 2 metre temperature standard_name: air_temperature units: °C
As xarray showes, the dimension of the t2m
DataArray is around 20TB. We will now try to narrow down the selection as much as possible.
We select only the Germany area and the meteorological winter months.
germany = {'latitude': slice(46, 56), 'longitude': slice(5, 16)}
t2m_germany = t2m.sel(**germany)
t2m_germany_winter = t2m_germany[t2m_germany.time.dt.month.isin([12, 1, 2])]
t2m_germany_winter
<xarray.DataArray 't2m' (time: 43320, latitude: 228, longitude: 251)> Size: 10GB dask.array<getitem, shape=(43320, 228, 251), dtype=float32, chunksize=(47, 228, 251), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 2kB 46.0 46.04 46.09 ... 55.89 55.93 55.98 * longitude (longitude) float64 2kB 5.01 5.054 5.098 ... 15.91 15.95 16.0 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 347kB 2020-01-01 ... 2039-12-31T23:00:00 Attributes: (12/19) GRIB_NV: 0 GRIB_cfName: air_temperature GRIB_cfVarName: t2m GRIB_dataType: fc GRIB_gridDefinitionDescription: 150 GRIB_gridType: healpix ... ... GRIB_typeOfLevel: heightAboveGround GRIB_units: K last_restart_dim_updated: 175320 long_name: 2 metre temperature standard_name: air_temperature units: °C
Notice that the size of the array dropped down to around 9GiB (in memory). However, due to the chunked structure of the DataArray, xarray must download every chunk that includes a portion of the selected data.
To estimate the size of the download, we can use the costing.py module. This estimate must be done before we apply any reduction operation.
import costing
costing.estimate_download_size(t2m, t2m_germany_winter)
estimated_needed_chunks: 922 estimated_memory_size: 46.406 GB estimated_download_size: 4.641 GB
3. Data reduction¶
We average the 2 metres temperature quarterly, starting on December the 1st. This also is a lazy operation.
t2m_germany_winter_mean = t2m_germany_winter.resample(time='QS-DEC').mean(dim="time")
t2m_germany_winter_mean
<xarray.DataArray 't2m' (time: 81, latitude: 228, longitude: 251)> Size: 19MB dask.array<transpose, shape=(81, 228, 251), dtype=float32, chunksize=(1, 228, 251), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 2kB 46.0 46.04 46.09 ... 55.89 55.93 55.98 * longitude (longitude) float64 2kB 5.01 5.054 5.098 ... 15.91 15.95 16.0 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 648B 2019-12-01 2020-03-01 ... 2039-12-01 Attributes: (12/19) GRIB_NV: 0 GRIB_cfName: air_temperature GRIB_cfVarName: t2m GRIB_dataType: fc GRIB_gridDefinitionDescription: 150 GRIB_gridType: healpix ... ... GRIB_typeOfLevel: heightAboveGround GRIB_units: K last_restart_dim_updated: 175320 long_name: 2 metre temperature standard_name: air_temperature units: °C
4. Data download¶
This is the phase where we explicitly trigger the download of the data. Remember to assign the return of the compute()
function to a new variable, so that the data is kept in memory.
%%time
t2m_germany_winter_mean_computed = t2m_germany_winter_mean.compute()
CPU times: user 3min 53s, sys: 1min 21s, total: 5min 14s Wall time: 1min 48s
t2m_germany_winter_mean_computed = t2m_germany_winter_mean_computed.dropna("time")
5. Visualization¶
We will now create and display an animation of the average winter 2 metres temperature in Germany, for the years 2020-2039.
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy import crs
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from mpl_toolkits.axes_grid1 import make_axes_locatable
fig, ax = plt.subplots(subplot_kw={'projection': ccrs.PlateCarree()})
ax.add_feature(cfeature.COASTLINE, linewidth=0.5)
ax.add_feature(cfeature.BORDERS, linestyle=':', linewidth=0.5)
ax.add_feature(cfeature.OCEAN, facecolor='lightblue', zorder=2)
ax.gridlines(draw_labels=True, zorder=3, color="white", alpha=0.5)
t2m_germany_winter_mean_computed.isel(time=0).plot(ax=ax, transform=ccrs.PlateCarree(), cmap='RdBu_r', add_colorbar=True, cbar_kwargs={'orientation': 'vertical', 'shrink':0.9,'pad': 0.15})
# function to update the plot for each frame (each timestep)
def update(frame):
data = t2m_germany_winter_mean_computed.isel(time=frame)
plot = data.plot(
ax=ax,
transform=ccrs.PlateCarree(),
cmap='RdBu_r',
vmin=-25,
vmax=25,
add_colorbar=False
)
ax.set_title(f"Time: {pd.Timestamp(data['time'].values).strftime('%Y-%m-%d')}")
return plot
anim = FuncAnimation(fig, update, frames=len(t2m_germany_winter_mean_computed['time']), repeat=True) # Create the animation
plt.close() # close the static plot to avoid duplicate display
HTML(anim.to_jshtml()) # display the animation in the notebook
Heating Degree Days (HDD) in Darmstadt¶
Let us now investigate the Heating Degree Days in Darmstadt. We start again from the 2 metres temperature.
t2m
<xarray.DataArray 't2m' (time: 175320, latitude: 4096, longitude: 8193)> Size: 24TB dask.array<sub, shape=(175320, 4096, 8193), dtype=float32, chunksize=(48, 512, 512), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 33kB -90.0 -89.96 -89.91 ... 89.91 89.96 90.0 * longitude (longitude) float64 66kB -180.0 -180.0 -179.9 ... 180.0 180.0 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 1MB 2020-01-01 ... 2039-12-31T23:00:00 Attributes: (12/19) GRIB_NV: 0 GRIB_cfName: air_temperature GRIB_cfVarName: t2m GRIB_dataType: fc GRIB_gridDefinitionDescription: 150 GRIB_gridType: healpix ... ... GRIB_typeOfLevel: heightAboveGround GRIB_units: K last_restart_dim_updated: 175320 long_name: 2 metre temperature standard_name: air_temperature units: °C
We narrow down the selection to the data that is closer to Darmstadt.
darmstadt = {"latitude": 49.88, "longitude": 8.65}
base_temperature = 15 #[°C]
t2m_darmstadt = t2m.sel(darmstadt, method="nearest")
t2m_darmstadt
<xarray.DataArray 't2m' (time: 175320)> Size: 701kB dask.array<getitem, shape=(175320,), dtype=float32, chunksize=(48,), chunktype=numpy.ndarray> Coordinates: latitude float64 8B 49.87 longitude float64 8B 8.657 step timedelta64[ns] 8B ... surface float64 8B ... * time (time) datetime64[ns] 1MB 2020-01-01 ... 2039-12-31T23:00:00 Attributes: (12/19) GRIB_NV: 0 GRIB_cfName: air_temperature GRIB_cfVarName: t2m GRIB_dataType: fc GRIB_gridDefinitionDescription: 150 GRIB_gridType: healpix ... ... GRIB_typeOfLevel: heightAboveGround GRIB_units: K last_restart_dim_updated: 175320 long_name: 2 metre temperature standard_name: air_temperature units: °C
We estimate the cost of the download with the costing.py
module.
costing.estimate_download_size(t2m, t2m_darmstadt)
estimated_needed_chunks: 3653 estimated_memory_size: 183.862 GB estimated_download_size: 18.386 GB
We compute the HDD in Darmstadt with a very simplyfied formula:
t2m_darmstadt_daily_mean = t2m_darmstadt.resample(time='1D').mean(dim='time')
diff = (base_temperature - t2m_darmstadt_daily_mean)
hdd = diff.where(diff > 0).groupby("time.year").sum()
We explicitly trigger the download of the data. Remember to assign the return of the compute()
function to a new variable, so that the data is kept in memory.
%%time
hdd_computed = hdd.compute()
CPU times: user 8min 48s, sys: 4min 53s, total: 13min 41s Wall time: 3min 43s
We can finally visualize the HDD in Darmstadt.
plt.style.use("seaborn-v0_8-darkgrid")
fig, ax = plt.subplots()
plt.bar(hdd_computed.year, hdd.values, color='#ff0000', alpha=0.7)
plt.xlabel('time')
plt.ylabel('HDD [°C]')
plt.grid(axis='y', alpha=0.75)
plt.title('Heating Degrees Days in Darmstadt')
plt.xticks(hdd.year[::2]);