General Data Description And Access

Data Abstract

The NASA Earth Exchange (NEX) Downscaled Climate Projections (NEX-DCP30) dataset is comprised of downscaled climate scenarios for the conterminous United States that are derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al. 2012] and across the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs) [Meinshausen et al. 2011] developed for the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5). The dataset includes downscaled projections from 33 models, as well as ensemble statistics calculated for each RCP from all model runs available. The purpose of these datasets is to provide a set of high resolution, bias-corrected climate change projections that can be used to evaluate climate change impacts on processes that are sensitive to finer-scale climate gradients and the effects of local topography on climate conditions.

Each of the climate projections includes monthly averaged maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2005 (Retrospective Run) and from 2006 to 2099 (Prospective Run).

The digital object identifier DOI

Alternate Access Overview

The space apps challenge 2014 focus is on the creation of a visualization interface (mobile app or a webapp accessible on a mobile platform) that would allow location-specific access to high-resolution climate data sets for the U.S. using coordinates specified by the user, or from GPS coordinates available from mobile devices. Applications may combine the climate datasets with other datasets (for example, maps of sea level rise) or models (for example, relationships between daily maximum temperature and electricity demand and pricing) to provide localized information on climate impacts.

Given that contestants only have a very short time to conceive, design and develop a visualization interface, the NEX team has provided a provisional API to access some of the NEXDCP-30 data (see Data Availability section) as point based time series data formatted as JSON strings. The provisional API was constructed for developer convenience and to potentially reduce redundant work amongst contestants. We feel extracting time series data at some location, or a small group of locations, to gain insight on localized climate impacts will be very common for this challenge. By potentially simplifying the data access, contestants may be able to focus on data presentation and visualization as opposed to complicated, and potentially expensive, data reformatting and extractions.

The provisional API presented below, at a glance, looks RESTful in nature but we do not rigorously adhere to all REST constraints. The API is constructed with simplicity in mind and the cost to design, implement and maintain the API was to be minimal. We expect a diverse contestant base that may not be core web or web services developers, so the method of documenting this API may diverge from common practices of documenting many other more rigorously developed RESTful API’s.

The intent of the information provided here is only to give a quick look on how to access the NEXDCP-30 data via this convenience API. Contestants interested in other aspects of the data or other access methods should see the publications attached to the NEX Space Apps Challenge 2014 Page and the DOI location.

Quick Start (Draft)

Below is a draft and may be updated just prior to the date of the Challenge.

Scenarios/Experiments

The API provides contestants access to two NEXDCP-30 climate projection simulations. These two projection simulations are forced with specified concentrations consistent with a high emissions scenario (rcp8.5) and a medium mitigation scenario (rcp4.5). The scenarios that are available through this API are rcp8.5 experiment r1i1p1 and rcp4.5 experiment r1i1p1 as well as the historical data (retrospective runs, r1i1p1). For more information on the details of the experiments or scenarios please see the publication resources posted on space apps challenge 2014 project pages NEX Space Apps Challenge 2014 Page. The relevant API abbreviations, name or keys, associated with the two projections simulations are listed here for convenience: rcp85r1i1p1, rcp45r1i1p1. These API abbreviations or keys are minimally described via the API itself (see appendix).

Models

There are many models as well as model statistics available in NEXDCP-30 dataset. For our purposes here, each model is associated with the two scenarios as noted above. Below is a list of model abbreviations, or keys, relevant to the API. These abbreviations, or keys, are minimally described via the API but are listed here for your convenience: access1-0, bcc-csm1-1, bcc-csm1-1-m, bnu-esm, canesm2, ccsm4, cesm1-bgc, cesm1-cam5, cmcc-cm, cnrm-cm5, csiro-mk3-6-0, fgoals-g2, fio-esm, gfdl-cm3, gfdl-esm2g, gfdl-esm2m, giss-e2-r, hadgem2-ao, inmcm4, ipsl-cm5a-lr, ipsl-cm5a-mr, ipsl-cm5b-lr, miroc-esm, miroc-esm-chem, miroc5, mpi-esm-lr, mpi-esm-mr, mri-cgcm3, noresm1-m, models-quartile-25 , models-quartile-75 , models-average.

Note the model statistics “models-quartile-25”, ”models-quartile-75”, “models-average” provide a model ensemble view (summaries of all models) for each of the scenarios noted above. These API abbreviations, or keys, are minimally described via the API itself (see appendix).

Variables

Each of the climate projections will include monthly averaged maximum temperature (C), minimum temperature (C), and precipitation (mm). The relevant API abbreviations, or keys, associated with the three variables are listed here: tmax, tmin, prcp. These API abbreviations, or keys, are minimally described via the API itself (see appendix).

Time

There are retrospective runs (1950-2005) and prospective runs (2006-2100) for each model, scenario and variable. Note that the retrospective and prospective are concatenated together for simplicity, meaning when a developer requests data through this API for some variable of a given model and scenario at some location the developer would receive a time series from 1950 to 2100 on monthly time steps. Note that all variables of a given model and scenario at some location share the same time index, see the examples below.

Space

All of the NEXDCP-30 data (all variables of a given model and scenario) are formatted to exactly the same “spatial grid”. The “spatial grid” is fixed on a geographic coordinate system with each cell, or pixel, of the size 30 arc-seconds x 30 arc-seconds (0.0083333333 x 0.0083333333 in decimal degrees). The west-bounding coordinate in decimal degrees is -125.02083333 and the north-bounding coordinate is 49.9375. For simplicity, let’s set these bounds as the point (ulx,uly) and note it as (-125.02083333, 49.9375). The south-bounding coordinate in decimal degrees is 24.0625 and the east-bounding coordinate is -66.47916667. Let’s set these bounds as the point (llx,lly) and note it as (-66.47916667, 24.0625). A rough view of the space the data is mapped to is:

(-125.02083333, 49.9375) 
    +-------+------------------------------+
    | (0,0) |  . . .                       |
    +-------+                              |
    |     .                                |
    |     .                  +-------------+
    |     .   .  .  .        | (7024,3104) |
    +------------------------+-------------+
                             (-66.47916667, 24.0625)

The upper left of pixel (0,0) = (-125.02083333, 49.9375) and the lower right of pixel (7024, 3104) = (-66.47916667, 24.0625)

All raw data at a location may be retrieved by the cell or pixel coordinates, with the top left corner index at 0,0 and the lower right corner is the maximum columns - 1, 7024, and maximum rows - 1, 3104. Example routines to translate a latitude value to a cell or pixel Y index and a longitude value to a cell or pixel X index are given in the appendix below. Note that locations where no valid data point within the time series exists no JSON data will be posted for that cell/pixel.

High Level Metadata

This provisional convenience API provides limited access to metadata. If you require more detailed metadata then we recommend you access the netcdf data files directly via some netCDF data viewer or netCDF API, see DOI or the original data is available on Amazon's Web Services (AWS) S3 storage engine, s3://nasanex/NEX-DCP30/ . For many contestants, the metadata available via the provisional API may be sufficient to get going and buildout an 'App'. See the examples on how one may access the limited metadata.

Putting it all together and examples

Here, we show how to directly access the data in the JSON format. Recall the references to the abbreviations, or keys, above, e.g. rcp85r1i1p1, tmax, miroc-esm, etc… and the cell or pixel locations noted above x, y. Using the abbreviations, or keys, and grid coordinates one can access the time series data in JSON format with basic URLs of the form: http://opennexapp.s3.amazonaws.com/<model>/<scenario>/<variable>/<x>/<y> All elements of “<*>” in this link would be filled in with the appropriate abbreviations or keys. Note that there is no time index parameter as the full time series of both the retrospective and prospective runs are always concatenated together and returned. The above API call is the minimum addressable unit or block of data for a given model, scenario and variable that a developer can access. By setting this minimum addressable unit shown here we can reduce the complexity of the infrastructure needed and costs to provide this data via this API.

Here is an example call, which retrieves the maximum temperature from the model cesm1-cam5 for the rcp8.5 scenario at cell 1000, 1000:

http://opennexapp.s3.amazonaws.com/cesm1-cam5/rcp85r1i1p1/tmax/1000/1000

And minimum temperature:
http://opennexapp.s3.amazonaws.com/cesm1-cam5/rcp85r1i1p1/tmin/1000/1000

And precipitation:
http://opennexapp.s3.amazonaws.com/cesm1-cam5/rcp85r1i1p1/prcp/1000/1000

Here is an example of maximum temperature for the "models-average" for the rcp8.5 scenario at cell 2000, 2000:
http://opennexapp.s3.amazonaws.com/models-average/rcp85r1i1p1/tmax/2000/2000

Acknowledge that the appropriate abbreviations or keys noted in the sections above and indices were substituted in for “<*>”. The JSON string should look like: { "data" : [ <value>, … ] }. As we mentioned in the time section above, all variables of a given model and scenario at some location share the same time index.

As a convenience, some common calls of interest are listed below:

To retrieve a listing of all available models: http://opennexapp.s3.amazonaws.com/models

To retrieve basic information about a specific model: http://opennexapp.s3.amazonaws.com/models/<model>

Example call: http://opennexapp.s3.amazonaws.com/models/cesm1-cam5

To retrieve a listing of all available variables: http://opennexapp.s3.amazonaws.com/variables

To retrieve basic information about a variable: http://opennexapp.s3.amazonaws.com/variables/<variable>

Example call: http://opennexapp.s3.amazonaws.com/variables/tmax

To retrieve basic information about the spatial bounds: http://opennexapp.s3.amazonaws.com/bounds

To retrieve information about the pixel or cell size: http://opennexapp.s3.amazonaws.com/resolution

To retrieve basic information about a temporal information: http://opennexapp.s3.amazonaws.com/time

Alternately, the temporal information formatted as milliseconds since the Unix epoch is available: http://opennexapp.s3.amazonaws.com/epmtime

To retrieve a listing of all available scenarios: http://opennexapp.s3.amazonaws.com/scenarios

To retrieve information about a scenario: http://opennexapp.s3.amazonaws.com/scenarios/<scenario>

Example call: http://opennexapp.s3.amazonaws.com/scenarios/rcp85r1i1p1

Data Locality

Pointing out the obvious, we use the AWS S3 storage engine to serve the data for this challenge. This dataset is physically located in the "us-west-2" region. Developers or contestants should take heed to the data's physical location for performance reasons.

Data Availability

Due to resource constraints we are unable to provide all the data via this API prior to the challenge date. The data is in the process of being loaded so with time any apps will have access to the entire set. Note, a permission denied message may be returned when no data has yet been uploaded/made accessible via this access method for a particular location.

Appendix:

For clarity, the data call is described in ~Backus–Naur Form below.

<data> :: = <url>
<url>  :: = <hostpart> "/" <model> "/"  <scenario> "/" <variable> "/" <x> "/" <y>
<hostpart> :: = "http://opennexapp.s3.amazonaws.com" | "https://opennexapp.s3.amazon.com" 
<model> :: = <One of the models abbreviations noted above> | <One of the statistical abbreviations noted above> 
<scenario> :: = "rcp85r1i1p1”  | "rcp45r1i1p1”
<variable > :: = "tmax"  | "tmin" | "prcp"   
<x> :: = 0 | 1 | ... to max columns -1
<y> :: = 0 | 1 | ... to max rows -1

Note that the data call strips the expected "/models/" portion of the url for performance reasons...

Finding a pixel index based on a latitude and longitude

Here, we show one trivial way to find the offsets or indices for this particular dataset. If we set the latitude of interest, lat = yy.yy, and the longitude of interest, lon = xx.xx. Recall from the "space" section above ulx = (-125.02083333, 49.9375), and the cell resolutions in both the x and y directions is ~0.0083333333, then:

cell_index_y = floor( (49.9375 - (yy.yy)) / 0.0083333333 )

cell_index_x = floor(  (xx.xx - (-125.020833335))  / 0.0083333333 )

A Rudimentary Python Examples

Here, we show a very basic way to access the time index and data at a location using the python programming language:

import json, urllib2

datlink = urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/time') 
tms = json.loads( datlink.read() )

datlink = urllib2.urlopen('http://opennexapp.s3.amazonaws.com/cesm1-cam5/rcp85r1i1p1/tmax/1000/1000')

tmax = json.loads( datlink.read() ) 
for i, t in enumerate(tms['time']): 
    print t, tmax['data'][i] 

Another example showing how to iterate over all models, variables, scenarios, etc… for a location:

import json, urllib2, os
x, y = '1575', '1101'
models = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/models').read()  )
scenarios = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/scenarios').read()  )
vars = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/variables').read()  )
for m in models['models']:
   for s in scenarios['scenarios']:
      for v in vars['variables']:
         link = os.path.join('http://opennexapp.s3.amazonaws.com', m, s, v, x, y)
         try:
            datlink = urllib2.urlopen(link)
            print link
            print datlink.read()
         except urllib2.HTTPError, e:
            print str(e), " : data may not be available for this location yet..."

A Rudimentary R Example

Here, we show a very basic way to access the time index and data at a location using the R language:

> library(rjson)
> library(RCurl)
> tmaxjson <- fromJSON(getURL('http://opennexapp.s3.amazonaws.com/models-average/rcp85r1i1p1/tmax/2000/2000'))
> tmjson <- fromJSON(getURL('http://opennexapp.s3.amazonaws.com/time'))
> tmaxjson
$data
   [1] 10.1293 12.5656 14.8908 20.1560 25.2173 30.0557 30.5390 28.7185 26.5432
  [10] 21.5208 14.9003 10.2422 10.9209 12.8353 15.4549 20.1122 24.6436 29.7864
  [19] 29.6381 28.5500 26.1740 20.9531 15.0010 11.6408 10.0112 11.7159 15.6254
  [28] 19.5867 24.8526 29.7285 30.0486 28.5330 26.4650 21.6668 15.0258 11.3451
  [37] 10.7038 12.8143 15.6452 19.8457 25.2900 30.3740 29.7458 28.6548 26.4027
  [46] 21.7890 15.2553 10.9860 10.1951 12.1484 14.4593 19.8128 24.4576 29.8256
  [55] 29.9366 28.5387 26.4323 21.4241 14.7691 10.5006  9.8161 12.2651 15.1018
  [64] 20.0899 24.7434 29.6495 29.9034 28.4824 26.3233 21.6089 15.6500 10.3110
  [73] 10.5648 12.7009 15.5305 19.5879 24.8507 29.5032 29.8619 28.6151 26.1611
  [82] 21.7330 14.9829 10.5692 10.1602 12.5058 15.0319 19.7353 24.6014 29.9692

.
.
.

> tmjson                                                               
$calendar
[1] "gregorian calendar"

$time
   [1] "1950-01-15" "1950-02-15" "1950-03-15" "1950-04-15" "1950-05-15"
   [6] "1950-06-15" "1950-07-15" "1950-08-15" "1950-09-15" "1950-10-15"
  [11] "1950-11-15" "1950-12-15" "1951-01-15" "1951-02-15" "1951-03-15"
  [16] "1951-04-15" "1951-05-15" "1951-06-15" "1951-07-15" "1951-08-15"
  [21] "1951-09-15" "1951-10-15" "1951-11-15" "1951-12-15" "1952-01-15"
  [26] "1952-02-15" "1952-03-15" "1952-04-15" "1952-05-15" "1952-06-15"
  [31] "1952-07-15" "1952-08-15" "1952-09-15" "1952-10-15" "1952-11-15"
  [36] "1952-12-15" "1953-01-15" "1953-02-15" "1953-03-15" "1953-04-15"
.
.
.

Another example showing how to iterate over all models, variables, scenarios, etc… for a set of locations:

import json, urllib2, os
models = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/models').read()  )
scenarios = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/scenarios').read()  )
vars = json.loads( urllib2.urlopen( 'http://opennexapp.s3.amazonaws.com/variables').read()  )
locs = [ ('363','1253'), ('1563','2033'), ('543','1523'), ('861','1950'), \
         ('1251','2081'), ('372','1607'), ('395','1616'), ('375','1601'), \
         ('405','1592')  ]
for x,y in locs:
   for m in models['models']:
      for s in scenarios['scenarios']:
         for v in vars['variables']:
            link = os.path.join('http://opennexapp.s3.amazonaws.com', m, s, v, x, y)
            print link
            datlink = urllib2.urlopen(link)
            print datlink.read()