How to Read NEX DCP30 NetCDF Files with Python on AWS

This wiki explains the basic steps to set up Python and dependent software packages in order to read NEX-DCP30 data, which are in the netCDF format, on AWS.

1. This wiki assumes that the user has basic knowledge on how to launch an Amazon EC2 instance and mount OpenNEX Landsat data from the Amazon Public S3 Buckets, which are described in detail by Mount OpenNEX Amazon Public S3 Buckets on an Amazon EC2 Instance
2. This wiki uses an EC2 instance built with an Amazon Linux AMI as an example. Users using other Linux systems (e.g., Ubuntu) should be able to follow the steps though the specific commands may be different.

In this wiki, it is assumed that the DCP30 data is mounted at "/mnt/s3-nexdcp30".

$ ls /mnt/s3-nexdcp30/
BCSD  doi.txt  NEX-quartile

[Task 1: Install HDF5 and NetCDF4 libraries]
Step 1: Install the HDF5 library

$ sudo -i
$ cd /usr/local/src/
$ wget
$ tar -xzvf hdf5-1.8.12.tar.gz
$ cd hdf5-1.8.12
$ ./configure --prefix=/usr/local $ make $ make install
note that many configuration options (e.g., “enable-fortran”, “enable-cxx”) are neglected in this wiki. To view the full list of the options, use “./configure --help”.

Step 2: Install the NetCDF4 library
The NetCDF4 library can be installed in a similar way.
$ sudo -i
$ cd /usr/local/src/
$ wget
$ tar -xzvf netcdf-4.3.1.tar.gz
$ cd netcdf-4.3.1
$ ./configure  --prefix=/usr/local
$ make
$ make install
Again, more advanced compiling options are neglected here.

Step 3: Install the NetCDF4-Python Interface
$ sudo -i
$ cd /usr/local/src/
$ wget
$ tar -xzvf netCDF4-1.0.7.tar.gz
$ cd netCDF4.1.0.7
$ python build
$ python install
$ cd test
$ python

[Task 2: Writing the first script to read the DCP-30 data]
An example python script for the basic NetCDF reading is as follows:

import os
import sys 
import numpy 
import netCDF4 as netCDF

# Some control variables firstyear=2006 lastyear=209 year_step = 5 # each file contains 5-years of data
# path setting dir_in = '/mnt/s3-nexdcp30/NEX-quartile/rcp85/mon/atmos/tasmax/r1i1p1/v1.0/CONUS' dir_out = './data' fin_tmpl = '' fout_tmpl = 'ts_%s_ens-avg_amon_rcp85_CONUS_%04d01-%04d12.txt'
# variable name var_in = 'tasmax' var_out = 'tasmax'
if not os.path.exists(dir_out): os.mkdir(dir_out) # end_if
file_out = dir_out + '/' + fout_tmpl % (var_in, firstyear, lastyear) fout = open(file_out, 'w')
for year in range(firstyear, lastyear, year_step): year_s = year year_e = year + year_step - 1
if year_e > lastyear: year_e = lastyear # fi nyear = year_e - year_s + 1
# access the netcdf file file_in = dir_in + '/' + fin_tmpl % (var_in, year_s, year_e) print "Processing ", file_in
# open the file nfl = netCDF.Dataset(file_in, 'r')
# read the information about dimensions ndim = len(nfl.dimensions['time'])
# check ndim is as expected if ndim <> (12*nyear): print 'Error: The dimension of the data file does not match expectation. Exiting...' exit(-1) # fi
# start the loop for i in range(ndim): # data return as a masked array data = nfl.variables[var_in][i, :, :] # compress the unmasked values to a one-dimension array data = data.compressed() # IMPORTANT: there are a lot of data, and adding them togetheri (as in calculating the mean) can exceed the limits of 4-byte floating data (overflow). data = data.astype('f8') # use double-precision instead! data = data.mean(axis=0) # here we just calculate the mean
yr = year + int(i/12) mn = i % 12 + 1 # save the data fout.write("%04d, %02d, %.4e\n" % (yr, mn, data)) # end_for: ndim fout.flush() #force flush
# clean up nfl.close() # end_for
The script does a very easy job: open a file, read the data as an array, calculate their mean, and save it into a text file. The use of NetCDF4, described by the comments, is also very straightforward. More information about the NetCDF4 data model and APIs can be found at the NetCDF Tutorials.