This is a script designed to obtain snow data from the ERA5 reanalysis product. We will be using the Copernicus API to get global, daily snow cover and snow depth information.
This code is adapted from Tasha Snow’s ERA5 downloading script: ERADownload.ipynb
The Copernicus Climate Data Store (CDS) API is not on CryoCloud by default, so the following cell needs to be run, followed by restarting the kernel.
To use the CDS API, the user needs credentials to the Copernicus Climate Data Store (CDS). Upon getting a user ID (uid
) and an API key (api-key
), they need to run the following cell (skip if you already have ./cdsapirc
in the /home/jovyan/
directory).
# !echo url: https://cds.climate.copernicus.eu/api/v2 >> /home/jovyan/.cdsapirc
# !echo key: {uid}:{api-key} >> /home/jovyan/.cdsapirc
from ecmwfapi import ECMWFDataServer # Need a ecmwf user name and password first
import cdsapi
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 from ecmwfapi import ECMWFDataServer # Need a ecmwf user name and password first
2 import cdsapi
ModuleNotFoundError: No module named 'ecmwfapi'
The CDS API can be a bit picky with inputs from ERA5, so first-time users are encouraged to use the online request form (https://
The below functions retrieve ERA5 snow depth and snow density and download them to a tmp/
folder. Additional parameters to consider:
yearStart
andyearEnd
: Start and end year.monthStart
andmonthEnd
: Start and end month.dayStart
anddayEnd
: Start and end day.
The function currently grabs daily data from March 1, 2020 - April 30, 2020 at 12:00 UTC each day, and downloads as daily netCDF files. Because ERA5 is generated hourly, users can expand the time
entry to include more hours per day.
from pathlib import Path
# Initialize the CDS API
c = cdsapi.Client()
def retrieve_era5():
"""
A function to demonstrate how to iterate efficiently over several years and months etc
for a particular ERA5 request.
"""
yearStart = 2020
yearEnd = 2020
monthStart = 3
monthEnd = 3
dayStart = 1
dayEnd = 31
for year in list(range(yearStart, yearEnd + 1)):
for month in list(range(monthStart, monthEnd + 1)):
for day in list(range(dayStart, dayEnd + 1)):
startDy = '%02d' % (day)
startMo = '%02d' % (month)
startYr = '%04d' % (year)
tmp_dir = Path.cwd() / "tmp"
tmp_dir.mkdir(exist_ok=True)
target = f"{tmp_dir}/era5_SWE_{startYr}{startMo}{startDy}.nc"
era5_request(startYr, startMo, startDy, target)
def era5_request(startYr, startMo, startDy, target):
"""
Helper function for era5_retrieve. An ERA-5 request for snow
depth and snow cover data for the given years/months/days.
Inputs
------------
startYr: str
Starting year of data query, in YYYY format.
startMo: str
Starting month of data query, in MM format.
startDy: str
Starting day of data query, in DD format.
target: str
Path and name of netCDF file to be saved.
"""
c.retrieve(
'reanalysis-era5-land',
{
'product_type':['reanalysis'],
'data_format':'netcdf',
'variable':['snow_depth', 'snow_cover'],
'year':[startYr],
'month':[startMo],
'day':[startDy],
'time':['12:00']
},
target)
if __name__ == '__main__':
retrieve_era5()
Depending on the number of files downloaded (31 in the case of the above example), it can take a while to download everything.
When it finishes, there should now be daily ERA5 data in netCDF format! To efficiently load all of this data, we are going to use Xarray and its open_mfdataset()
function.
import os
import re
import zipfile
import xarray as xr
from os import listdir
from os.path import join
def process_era5_data(tmp_path):
# Find ERA5 Zip files in downloaded directory
era5_files = [join(tmp_path,f) for f in listdir(tmp_path) if "era5_" in join(tmp_path, f)]
# Iteratively unzip each file and collect into a list
tmp_files = era5_extract(era5_files, tmp_dir)
print('------------')
# Open all ERA5 files into single Xarray
ds = xr.open_mfdataset(tmp_files)
print("All data has been lazy-loaded into Xarray.")
# Remove extracted files, for cleanliness
for file in tmp_files:
os.remove(file)
print("Extracted ERA-5 files deleted.")
return ds
def era5_extract(era5_files, tmp_dir):
for file in era5_files:
with zipfile.ZipFile(file, 'r') as zfile:
print(f'Now extracting data for file: {file}')
# Extract all files from current Zip file
zfile.extractall(tmp_dir)
# Rename output file to prevent overwriting
outfile = join(tmp_dir, "data_0.nc")
date_pattern = re.search(r'\d{8}', file).group(0)
newfile = join(tmp_dir, f'data_{date_pattern}.nc')
os.rename(outfile, newfile)
print(f'Data extracted and saved to file: data_{date_pattern}.nc')
print(' ')
# List of output files
tmp_files = [join(tmp_dir,f) for f in os.listdir(tmp_dir) if "data_" in join(tmp_dir, f)]
return tmp_files
tmp_dir = Path.cwd() / "tmp"
ds = process_era5_data(tmp_dir)
ds
Thanks to the above function, loading all of that data is pretty easy! However, it is important to note that the data is currently “lazy-loaded” - we can easily subset and resample the data for our needs, but we will need to load it into memory if we wish to make figures.
Fully loading the data as is can be time-consuming, so let’s reduce the data first, starting with making monthly means of snow depth.
# Calculate monthly mean snow depth and snow cover
era5_monthly = ds.resample(valid_time='1ME').mean()
Resampling to monthly means reduces the data volume by quite a bit, so let’s now look at global snow depth from the month of March. We will go ahead and load the result into memory using the compute()
function.
# Load March snow depths into memory
era5_sd_march = era5_monthly['snowc'].compute().squeeze()
Finally, we can make a map figure showing global, monthly-averaged snow depth from ERA5.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
era5_sd_march.plot.imshow(ax=ax, cmap='Blues')
ax.set_xlabel("Longitude", fontsize=12)
ax.set_ylabel("Latitude", fontsize=12)
ax.set_title("ERA5 Snow Cover, March 2020", fontsize=12)
fig.tight_layout()
Now for a different example. Here, we will examine snow depths over Alaska only, and generate a state-wide time series for the month of March.
# Making bounds for Alaska
mask_lon = (ds.longitude >= -168.75+360) & (ds.longitude <= -136.01+360)
mask_lat = (ds.latitude >= 52.64) & (ds.latitude <= 71.59)
# Subset ERA5 data to Alaska lats/lons only
era5_alaska = ds.where(mask_lon & mask_lat, drop=True)
As before, we need to load the Alaska data into memory. Because we are looking over a much smaller spatial domain, compute()
will be much faster.
# Load Alaska data into memory
era5_alaska = era5_alaska.compute().squeeze()
Again, we can make a map figure showing snow depth over the state of Alaska, this time for March 1, 2020:
# Map plot of Alaska snow depths
era5_alaska['snowc'].isel(valid_time=0).plot.imshow(vmin=0, vmax=1, cmap="Blues")
We can also create a spatially-averaged time series of snow depth over the state of Alaska for the entire time period March 1 - April 30:
# Calculate spatial average of snow depths over Alaska
era5_sd_alaska = era5_alaska['snowc'].mean(('longitude', 'latitude'))
# Time series plot of Alaska snow depths
fig, ax = plt.subplots()
era5_sd_alaska.plot(ax=ax)
ax.set_xlabel("Day", fontsize=12)
ax.set_ylabel("Snow depth [m]", fontsize=12)
ax.set_title("March 1 - April 30, 2020", fontsize=12)
fig.tight_layout()