Working with NetCDF files in R
18 February 2014
View the source of this post
NetCDF is an open file format commonly used to store oceanographic (and other) data such as sea surface temperature (SST), sea level pressure (SLP), and much more. I recently needed to work with SST data from the NCEP Reanalysis and found that I didn’t know how to work with NetCDF files. This post should serve as a short introduction working with NetCDF files using the R package
Step 1: Acquire the NetCDF library
Before we can open a NetCDF file in R, we need to install the NetCDF library on our system. I’m using a Mac running OS 10.9 and I use
homebrew as my package manager.
If you’re using
homebrew, install the
netcdf library with:
brew install netcdf
If you’re on Windows, you can find pre-built binaries at the developer’s download page. For other systems, consult the documentation.
Step 2: Install and load the
ncdf package in R
And load it:
Step 3: Load your NetCDF file
For this tutorial, I’ll be working with the NOAA Optimum Interpolation (OI) Sea Surface Temperature (SST) V2 data series of monthly means from December 1981 – current. These data are produced on a 1° grid for the entire globe.
I am interested in the following variables: latitude, longitude, time, and SST:
lon variables are just vectors containing the range of latitudes (89.5S to 89.5N) and longitudes (0.5°E to 359.5°E, starting from the prime meridian).
time variable is a little more complex. Let’s take a look:
Those don’t look like times at all. If you’re used to working with dates and times on computers, you may already be way ahead of me. If, however, if you are not, I’ll explain. Dates and times are usually stored by computers as a number of time units since we started counting time. The time units may be milliseconds, seconds, or even months – whatever suits the purpose. Computers commonly store dates and times as the number of seconds since January 1, 1970 (See UNIX time). In the case of our data, time is being counted as days since January 1, 1800. This is a little weird but it makes sense for our data. If you’re using a different NetCDF file than me, you’ll want to consult the documentation that goes along with the data to figure out how they’re counting time.
Now that we know how
time is being stored, we can make it human-readable:
Here, I set the origin to January 1, 1800 and then convert the original variable
time into a vector of R
Date objects, passing the parameters
time variable is now much more easier to understand:
The other variables I created,
time_year_months are for added utility. I can now reference SST data for just a set of years
or all SST values from June
Our last, but most important variable is SST.
sst is three dimensional, and is indexed by longitude, latitude, and time (respectively). To extract a single SST value from it, we’ll need to specify an index or range of indices for all three dimensions:
We can use our utility variables to, for example, extract all the observations for June from this same grid cell:
Depending on your goals, this may be as far as you need to get. But maybe you want to display these data visually. Let’s plot the SSTs for a range of grid cells onto a map.
Step 4: Convert the SST data to a
Our NetCDF file has a lot of observations:
To reduce the amount of computation, let’s subset the data to a range of latitudes and longitudes and also focus in on a particular month in the data set so we can plot this in two dimensions:
Notice how I constructed the
_indices variables. They are vectors of the same length as their corresponding variable but they contain TRUEs and FALSEs where the corresponding variable is equal to the desired ranges. This allows instant subsetting:
Which gives us the SST values for the 5x5 grid for June of 1990. To save them in a more convenient format, we’ll convert it to a
Now each row of
cdf_df will correspond to one SST value, one row for every grid cell combination (25 in total). Let’s populate it with SST values:
Looks good! Let’s make a map to display these data on. Luckily, this is pretty straightforward in R using the
The above map is a good start. I’ve used the
world2Hires map from
mapdata which lets me create a map centered on the Eastern Bering Sea. I’ve specific the x- and y-limits to show just the area where we have SST values. Let’s change the points to rectangles and make the background color of the rectangles correlate with the corresponding SST value for that grid.
We’ll first make the color scale:
To plot rectangles instead of points, we can use the rect() function instead of
rect() draws rectangles on the graphics device and needs the user to specify the coordinates of each corner (xleft, ybottom, xright, ytop).
Looks great! Hopefully this post was a useful introduction to working with and displaying NetCDF data.