How to work with the 'Essential Climate Variables for assessment of climate variability from 1979 to present'on Earth Data Hub: analysis of worldwide temperature change from pre-industrial period¶
Earth Data Hub (EDH) provides an innovative access to earth related data. This notebook will provide you guidance on how to access and use the https://data.earthdatahub.destine.eu/more-era5/ecv-for-climate-change-1981-2010-anomaly-v0.zarr
dataset.
Goal of this tutorial¶
The goal of this tutorial is to answer the following question:
Is the global average temperature changing compared to the pre-industrial period?
To answare this question we will compute the global temperature anomaly with respect to 1981-2010 reference period and correct it with the modern era temperature increase since pre-industrial period (1850-1900).
What you will learn:¶
- how to access the data
- select and average the data
- plot the results
To access datasets on Earth Data Hub you need to instruct your tools (xarray, Zarr, etc.) to use EDH personal access token when downloading the data.
To obtain a personal access token you first need to register to the Destination Earth platform. Then, you can go to Earth Data Hub account settings where you can find your default personal access token or create others. After retrieving your personal access token, please cut and paste it below: ⤵
PAT = "your_personal_access_token"
# e.g. PAT="edh_pat_44bbb7e9192a4c6bb47ddf07d07564eee5d17de8dfc48f7118f88e3bc4a4157f8fe2403f5aa0a2d53441b6922ea9a33a"
Working with EDH data¶
Datasets on EDH are often very large and remotely hosted. Typical use imply a selection of the data followed by one or more reduction steps to be performed in a local or distributed Dask environment.
The structure of a workflow that uses EDH data tipically looks like this:
- data access
- data selection
- (optional) data reduction
- data download
- further operations and visualization
Xarray and Dask work together following a lazy principle. This means that when you access and manipulate a Zarr store the data is in not immediately downloaded and loaded in memory. Instead, Dask constructs a task graph that represents the operations to be performed.
A smart user will first reduce the amount of data that needs to be downloaded and explicitly call compute()
on it. Once the compute()
operation is complete the data is loaded into memory and available for subsequent fast processing.
1. Data access¶
To preview the data, only the dataset metadata must be downloaded. Xarray does this automatically when you access a Zarr dataset:
import xarray as xr
ds = xr.open_dataset(
f"https://edh:{PAT}@data.earthdatahub.destine.eu/more-era5/ecv-for-climate-change-1981-2010-anomaly-v0",
chunks={},
engine="zarr",
)
ds
<xarray.Dataset> Size: 9GB Dimensions: (latitude: 721, longitude: 1440, valid_time: 548) Coordinates: * latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0 * longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8 number int64 8B ... surface float64 8B ... * valid_time (valid_time) datetime64[ns] 4kB 1979-01-01 ... 2024-08-01 Data variables: r (valid_time, latitude, longitude) float32 2GB dask.array<chunksize=(120, 256, 256), meta=np.ndarray> siconc (valid_time, latitude, longitude) float32 2GB dask.array<chunksize=(120, 256, 256), meta=np.ndarray> t2m (valid_time, latitude, longitude) float32 2GB dask.array<chunksize=(120, 256, 256), meta=np.ndarray> tp (valid_time, latitude, longitude) float32 2GB dask.array<chunksize=(120, 256, 256), meta=np.ndarray> Attributes: Conventions: CF-1.7 GRIB_centre: ecmf GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts GRIB_edition: 1 GRIB_subCentre: 0 history: 2024-10-03T17:07 GRIB to CDM+CF via cfgrib-0.9.1... institution: European Centre for Medium-Range Weather Forecasts
ā At this point, no data has been downloaded yet, nor loaded in memory.
2. Data selection¶
First we select the 2 metres temperature anomaly from the dataset:
xr.set_options(keep_attrs=True)
t2m_anomaly = ds.t2m
t2m_anomaly.attrs["units"] = "°C"
t2m_anomaly.attrs["long_name"] = "Temperature variation"
t2m_anomaly
<xarray.DataArray 't2m' (valid_time: 548, latitude: 721, longitude: 1440)> Size: 2GB dask.array<open_dataset-t2m, shape=(548, 721, 1440), dtype=float32, chunksize=(120, 256, 256), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0 * longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8 number int64 8B ... surface float64 8B ... * valid_time (valid_time) datetime64[ns] 4kB 1979-01-01 ... 2024-08-01 Attributes: (12/31) GRIB_NV: 0 GRIB_Nx: 1440 GRIB_Ny: 721 GRIB_cfName: unknown GRIB_cfVarName: t2m GRIB_dataType: an ... ... GRIB_typeOfLevel: surface GRIB_units: K GRIB_uvRelativeToGrid: 0 long_name: Temperature variation standard_name: unknown units: °C
3. Data reduction¶
We want to compute the 1981-2010 worldwide 2 metre temperature anomaly average. This will be easy in Xarray. However, to account for the fact that the data is structured in a regular latitude-longitude grid, we will use an appropriatly weighted average:
import numpy as np
xr.set_options(keep_attrs=True)
weight = np.pi / 2 * np.cos(np.deg2rad(t2m_anomaly.latitude))
t2m_anomaly_global_1981_2010 = (t2m_anomaly * weight).mean(["latitude", "longitude"])
t2m_anomaly_global_1981_2010
<xarray.DataArray (valid_time: 548)> Size: 4kB dask.array<mean_agg-aggregate, shape=(548,), dtype=float64, chunksize=(120,), chunktype=numpy.ndarray> Coordinates: number int64 8B 0 surface float64 8B 0.0 * valid_time (valid_time) datetime64[ns] 4kB 1979-01-01 ... 2024-08-01 Attributes: (12/31) GRIB_NV: 0 GRIB_Nx: 1440 GRIB_Ny: 721 GRIB_cfName: unknown GRIB_cfVarName: t2m GRIB_dataType: an ... ... GRIB_typeOfLevel: surface GRIB_units: K GRIB_uvRelativeToGrid: 0 long_name: Temperature variation standard_name: unknown units: °C
4. Data downlaod¶
This is the phase where we explicitly trigger the download of the data. To do so we will call compute()
on the previously averaged temperature. The result will be small enought to easily fit into memory. Remember to assign the return of the compute()
function to a new variable, so that the data is kept in memory.
We can measure the time it takes:
%%time
t2m_anomaly_global_1981_2010_computed = t2m_anomaly_global_1981_2010.compute()
CPU times: user 16.6 s, sys: 6.28 s, total: 22.9 s Wall time: 14 s
4. Visualization¶
This is the 2 metres temperature anomaly with respect to the 1981-2010 average:
import matplotlib.pyplot as plt
t2m_anomaly_global_1981_2010_computed.plot()
plt.title("2 meters temperature anomaly with respect to 1981-2010 reference period")
Text(0.5, 1.0, '2 meters temperature anomaly with respect to 1981-2010 reference period')
To obtain the 2 metre temperature anomaly with respect to 1850-1900 reference period, we must add the global temperature increase since the pre-industrial era. This is extimated to be 0.63°C, as explained in Copernicus Climate Bulletin.
t2m_anomaly_global_pre_industrial = 0.63 + t2m_anomaly_global_1981_2010_computed
Finally, we can visualize the 2 metre temperature anomaly with respect to the pre-industrial period:
t2m_anomaly_global_pre_industrial.plot()
plt.title("2 meters temperature anomaly with respect to the pre-industrial period")
Text(0.5, 1.0, '2 meters temperature anomaly with respect to the pre-industrial period')
We can also have a glance at the running average over 1 year:
t2m_anomaly_global_pre_industrial_12_month = t2m_anomaly_global_pre_industrial.rolling(valid_time=12).mean()
t2m_anomaly_global_pre_industrial_12_month.plot()
plt.title("2 meters temperature anomaly with respect to the pre-industrial period, 12 months rolling average")
Text(0.5, 1.0, '2 meters temperature anomaly with respect to the pre-industrial period, 12 months rolling average')