Skip to article frontmatterSkip to article content

This is a script designed to obtain snow data from the ERA5 reanalysis product. We will be using the Copernicus API to get global, daily snow cover and snow depth information.

This code is adapted from Tasha Snow’s ERA5 downloading script: ERADownload.ipynb

The Copernicus Climate Data Store (CDS) API is not on CryoCloud by default, so the following cell needs to be run, followed by restarting the kernel.

To use the CDS API, the user needs credentials to the Copernicus Climate Data Store (CDS). Upon getting a user ID (uid) and an API key (api-key), they need to run the following cell (skip if you already have ./cdsapirc in the /home/jovyan/ directory).

# !echo url: https://cds.climate.copernicus.eu/api/v2 >> /home/jovyan/.cdsapirc
# !echo key: {uid}:{api-key} >> /home/jovyan/.cdsapirc
from ecmwfapi import ECMWFDataServer # Need a ecmwf user name and password first
import cdsapi
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 from ecmwfapi import ECMWFDataServer # Need a ecmwf user name and password first
      2 import cdsapi

ModuleNotFoundError: No module named 'ecmwfapi'

The CDS API can be a bit picky with inputs from ERA5, so first-time users are encouraged to use the online request form (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download) to automatically generate a code for their API request, to ensure that the syntax is correct.

The below functions retrieve ERA5 snow depth and snow density and download them to a tmp/ folder. Additional parameters to consider:

  • yearStart and yearEnd: Start and end year.
  • monthStart and monthEnd: Start and end month.
  • dayStart and dayEnd: Start and end day.

The function currently grabs daily data from March 1, 2020 - April 30, 2020 at 12:00 UTC each day, and downloads as daily netCDF files. Because ERA5 is generated hourly, users can expand the time entry to include more hours per day.

from pathlib import Path

# Initialize the CDS API
c = cdsapi.Client()

def retrieve_era5():
    """      
       A function to demonstrate how to iterate efficiently over several years and months etc    
       for a particular ERA5 request.
    """
    yearStart = 2020
    yearEnd = 2020
    monthStart = 3
    monthEnd = 3
    dayStart = 1
    dayEnd = 31
    for year in list(range(yearStart, yearEnd + 1)):
        for month in list(range(monthStart, monthEnd + 1)):
            for day in list(range(dayStart, dayEnd + 1)):
                startDy = '%02d' % (day)
                startMo = '%02d' % (month)
                startYr = '%04d' % (year)
                tmp_dir = Path.cwd() / "tmp"
                tmp_dir.mkdir(exist_ok=True)
                target = f"{tmp_dir}/era5_SWE_{startYr}{startMo}{startDy}.nc"
                era5_request(startYr, startMo, startDy, target)

def era5_request(startYr, startMo, startDy, target):
    """      
        Helper function for era5_retrieve. An ERA-5 request for snow
        depth and snow cover data for the given years/months/days.

        Inputs
        ------------
        startYr: str
            Starting year of data query, in YYYY format.
        startMo: str
            Starting month of data query, in MM format.
        startDy: str
            Starting day of data query, in DD format.
        target: str
            Path and name of netCDF file to be saved.
    """
    c.retrieve(
    'reanalysis-era5-land',
    {
        'product_type':['reanalysis'],
        'data_format':'netcdf',
        'variable':['snow_depth', 'snow_cover'],
        'year':[startYr],
        'month':[startMo],
        'day':[startDy],
        'time':['12:00']
    },
    target)
        
if __name__ == '__main__':
    retrieve_era5()

Depending on the number of files downloaded (31 in the case of the above example), it can take a while to download everything.

When it finishes, there should now be daily ERA5 data in netCDF format! To efficiently load all of this data, we are going to use Xarray and its open_mfdataset() function.

import os
import re
import zipfile
import xarray as xr

from os import listdir
from os.path import join
def process_era5_data(tmp_path):
    # Find ERA5 Zip files in downloaded directory
    era5_files = [join(tmp_path,f) for f in listdir(tmp_path) if "era5_" in join(tmp_path, f)]
    
    # Iteratively unzip each file and collect into a list
    tmp_files = era5_extract(era5_files, tmp_dir)

    print('------------')
    # Open all ERA5 files into single Xarray
    ds = xr.open_mfdataset(tmp_files)
    print("All data has been lazy-loaded into Xarray.")

    # Remove extracted files, for cleanliness
    for file in tmp_files:
        os.remove(file)
    print("Extracted ERA-5 files deleted.")

    return ds

def era5_extract(era5_files, tmp_dir):
    for file in era5_files:
        with zipfile.ZipFile(file, 'r') as zfile:
            print(f'Now extracting data for file: {file}')
            # Extract all files from current Zip file
            zfile.extractall(tmp_dir)

            # Rename output file to prevent overwriting
            outfile = join(tmp_dir, "data_0.nc")
            date_pattern = re.search(r'\d{8}', file).group(0)
            newfile = join(tmp_dir, f'data_{date_pattern}.nc')
            os.rename(outfile, newfile)
            print(f'Data extracted and saved to file: data_{date_pattern}.nc')
            print(' ')

    # List of output files
    tmp_files = [join(tmp_dir,f) for f in os.listdir(tmp_dir) if "data_" in join(tmp_dir, f)]

    return tmp_files
tmp_dir = Path.cwd() / "tmp"
ds = process_era5_data(tmp_dir)
ds

Thanks to the above function, loading all of that data is pretty easy! However, it is important to note that the data is currently “lazy-loaded” - we can easily subset and resample the data for our needs, but we will need to load it into memory if we wish to make figures.

Fully loading the data as is can be time-consuming, so let’s reduce the data first, starting with making monthly means of snow depth.

# Calculate monthly mean snow depth and snow cover
era5_monthly = ds.resample(valid_time='1ME').mean()

Resampling to monthly means reduces the data volume by quite a bit, so let’s now look at global snow depth from the month of March. We will go ahead and load the result into memory using the compute() function.

# Load March snow depths into memory
era5_sd_march = era5_monthly['snowc'].compute().squeeze()

Finally, we can make a map figure showing global, monthly-averaged snow depth from ERA5.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
era5_sd_march.plot.imshow(ax=ax, cmap='Blues')
ax.set_xlabel("Longitude", fontsize=12)
ax.set_ylabel("Latitude", fontsize=12)
ax.set_title("ERA5 Snow Cover, March 2020", fontsize=12)
fig.tight_layout()

Now for a different example. Here, we will examine snow depths over Alaska only, and generate a state-wide time series for the month of March.

# Making bounds for Alaska
mask_lon = (ds.longitude >= -168.75+360) & (ds.longitude <= -136.01+360)
mask_lat = (ds.latitude >= 52.64) & (ds.latitude <= 71.59)

# Subset ERA5 data to Alaska lats/lons only
era5_alaska = ds.where(mask_lon & mask_lat, drop=True)

As before, we need to load the Alaska data into memory. Because we are looking over a much smaller spatial domain, compute() will be much faster.

# Load Alaska data into memory
era5_alaska = era5_alaska.compute().squeeze()

Again, we can make a map figure showing snow depth over the state of Alaska, this time for March 1, 2020:

# Map plot of Alaska snow depths
era5_alaska['snowc'].isel(valid_time=0).plot.imshow(vmin=0, vmax=1, cmap="Blues")

We can also create a spatially-averaged time series of snow depth over the state of Alaska for the entire time period March 1 - April 30:

# Calculate spatial average of snow depths over Alaska
era5_sd_alaska = era5_alaska['snowc'].mean(('longitude', 'latitude'))
# Time series plot of Alaska snow depths
fig, ax = plt.subplots()
era5_sd_alaska.plot(ax=ax)
ax.set_xlabel("Day", fontsize=12)
ax.set_ylabel("Snow depth [m]", fontsize=12)
ax.set_title("March 1 - April 30, 2020", fontsize=12)
fig.tight_layout()