How to Access and Visualize Landsat GLS Data on AWS with Python

This wiki explains the basic steps to set up Python and dependent software packages in order to read Landsat GLS data, which are in the GeoTiff format, on AWS.

[Prerequisites]
This wiki assumes that the user has basic knowledge on how to launch an Amazon EC2 instance and mount OpenNEX Landsat data from the Amazon Public S3 Buckets, which are described in detail by Mount OpenNEX Amazon Public S3 Buckets on an Amazon EC2 Instance
This wiki uses an EC2 instance built with an Amazon Linux AMI as an example. Users using other Linux systems (e.g., Ubuntu) should be able to follow the steps though the specific commands may be different.

[Task 1: Install the GDAL library]
GDAL is a software library that provides I/O APIs to multiple geospatial raster data formats, including GeoTiff. It also provides Python bindings. Follow the following steps to install GDAL:

Step 1: log on the EC2 instance and change to the root user with “sudo -i

Step 2: obtain the source distribution of GDAL

$ cd /usr/local/src
$ wget http://download.osgeo.org/gdal/1.10.1/gdal-1.10.1.tar.gz
$ tar –xzvf gdal-1.10.1.tar.gz
$ cd gdal-1.10.1

Step 3: Configure, build, and install the package
$ ./configure –-with-python –-prefix=/usr
The “--with-python” option indicates that the GDAL-Python binding needs to be built.
$ make
It may take quite a while (>1hr) to build the package. Go grabbing a cup of coffee or tea while you are waiting.
$ make install
This copies the library files to "/usr/lib64" and setup other information (e.g., the .pc files for pkg-config). It is much faster than the above step.

Step 4: By default, the GDAL-Python binding was built as an "egg". To make sure it is actually installed on your system, do the following:

$ cd swig/python/dist
$ easy_install GDAL-1.10.1-py2.6-linux-x86_64.egg

Step 5: Set the "LD_LIBRARY_PATH" if it hasn't been done:
$ export LD_LIBRARY_PATH=/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH

Step 6: Do a quick test by launching Python and import the "gdal" module. If everything went well, you should see something like this:

[root@ip-172-31-4-34 ~]# python
Python 2.6.9 (unknown, Oct 29 2013, 19:58:13) 
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import osgeo.gdal as gdal
>>> 

[Task 2: Writing the first script to read Landsat data]
Step 1: Landsat GLS data are large and thus are compressed as .gz files. So the first step is to create a scratch space for this task:

$ cd /home/
$ mkdir scratch
$ cd scratch
$ mkdir data
$ cd data
$ tar -xzvf /mnt/s3-landsat/gls/2000/044/034/p044r034_7x19990707.tar.gz 
Here it is assumed that the Landsat data is mounted at "/mnt/s3-landsat". Firing an "ls data" should have outputs like:
[root@ip-172-31-4-34 scratch]# ls data
p044r034_7dk19990707_z10_61.tif  p044r034_7dp19990707_z10_80.tif  p044r034_7dt19990707_z10_20.tif  p044r034_7dt19990707_z10_40.tif  p044r034_7dt19990707_z10_70.tif
p044r034_7dk19990707_z10_62.tif  p044r034_7dt19990707_z10_10.tif  p044r034_7dt19990707_z10_30.tif  p044r034_7dt19990707_z10_50.tif  p044r034_7x19990707.met
The .tif files are data for each individual band and the .met file has all the metadata.

Step 2: Create a python script for basic Geotiff reading.

from numpy import *
import osgeo.gdal as gdal
from osgeo.gdalconst import *
#---------------------------------------------------------------
def read_geotiff(filename, band_id=1):
   # Open the file
   dataset = gdal.Open( filename, GA_ReadOnly )
   if dataset is None:
      print 'Error: GDAL cannot open the file correct.'
      return None
   #fi 
   # Fetch a band
   if (band_id < 1 | band_id > dataset.RasterCount):
      print 'Error: The specified band number is out of the range.' 
      return None
   # fi
   band = dataset.GetRasterBand(band_id)
   # Get the Raster Data
   data = band.ReadAsArray(0, 0, dataset.RasterXSize, dataset.RasterYSize)
   # Close the file
   dataset = None  # the file will be closed when the dataset object is destructed
   return data
# end_def
#---------------------------------------------------------------
# the main routine 
if __name__ == "__main__":
   filename = './data/p044r034_7dt19990707_z10_10.tif'  # band 1 for TM and ETM+
   data = read_geotiff(filename)
   print data[:1000]
The script does a very easy job: open a file, read the data as an array, and dump the first 1000 values (which are all zeros if you try). The use of GDAL is also very straightforward. More information about the data model and APIs can be found at the GDAL Tutorials.

[Task 3: Visualize the data]
For most users it is more exciting to visualize the data as maps rather than look at the decimal values. Unfortunately this task is also more complicated and time consuming because it can take several rounds to get things right.

Step 1: Setting up X11-forwarding
Assume you are still logged in as the root user, launch the following commands:
$ yum install xorg-x11-xauth.x86_64 xorg-x11-server-utils.x86_64
This will change the settings in your /etc/ssh/sshd_config file to allow X11 forwarding.
Install a test tool like "xclock":
$ yum install xclock
Log out from the EC2 virtual machine and re-connect with the "-X" option:
$ ssh -X -i your_pem_file ec2-user@your_EC2_Instance_Name
and see if you can launch the xclock by $ xclock. If things went through, you will see a GUI clock being launched from the terminal.
[Note] do not change to the root user for X-window based applications.

Step 2: Install the Python Matplotlib package
The Amazon Linux AMI comes with a Matplotlib package, which can be installed using "$ sudo yum install Matplotlib". Unfortunately, this package use a pretty old version of Cairo as the backend and may induce some problem later. Therefore, this wiki describes how to install matplotlib from the source distribution (the version 1.2.0 is used as an example).
First obtain the source distribution:

$ sudo -i
$ cd /urs/local/src
$ wget https://github.com/downloads/matplotlib/matplotlib/matplotlib-1.2.0.tar.gz
$ tar -xzvf matplotlib-1.2.0.tar.gz
$ cd matplotlib-1.2.0
At this point, we need to install some required packages:
$ yum install python-devel
$ yum install numpy
$ yum install freetype-devel
$ yum install libpng libpng-devel
$ yum install libtiff libtiff-devel
$ yum install libX11 libX11-devel
The "libtiff" and "libX11" are not required by Matplotlib but are commonly used anyway.
Now build and install the package as:
$ python setup.py build
$ python setup.py install
By default, the Matplotlib uses AGG (Anti-Grain Geometry) as the backend renderer. However, AGG is a non-interactive backend, meaning that it writes to a file but not a GUI screen. It is possible to install other interactive backends such as TkAgg and GTK, but will require a lot more work (due to software version dependency issues for this particular AMI, in particular). Therefore, they are not described here. Instead, an independent image utility, ImageMagick Display, will be used view the generated figures (see below).
If the default Matplotlib backend is not AGG, it can be reset in the "matplotlibrc" file under the "/usr/lib64/python2.6/site-packages/matplotlib/mpl-data" directory.

Step 3: Revise the Python script to plot Landsat images

from numpy import *
from pylab import *
import osgeo.gdal as gdal
from osgeo.gdalconst import *
#---------------------------------------------------------------
def read_geotiff(filename, band_id=1):
    # the same function as above
     ...
# end_def
#---------------------------------------------------------------
if __name__ == "__main__":
   resample_int = 4           # parameter for resampling the data because they are too large
   filename = './data/p044r034_7dt19990707_z10_10.tif'  # band 1 for TM and ETM+
   data_blue = read_geotiff(filename)
   data_blue = data_blue[::resample_int, ::resample_int]
   filename = './data/p044r034_7dt19990707_z10_20.tif'  # band 2
   data_green = read_geotiff(filename)
   data_green = data_green[::resample_int, ::resample_int]
   filename = './data/p044r034_7dt19990707_z10_30.tif'  # band 3
   data_red = read_geotiff(filename)
   data_red = data_red[::resample_int, ::resample_int]
   (N, M) = data_blue.shape
   data_rgb = zeros((N, M, 3), 'u1')
   data_rgb[:, :, 0] = data_red.astype('u1')
   data_rgb[:, :, 1] = data_green.astype('u1')
   data_rgb[:, :, 2] = data_blue.astype('u1')
   imshow(data_rgb,)
   show()      # Does nothing for non-interactive backends
   savefig('test_landsat_truecolor.png', dpi=100)
This piece of code should generate a image file named as "test_landsat_truecolor.png" under the current directory (e.g., /home/scratch).

Step 4: Install and Use Imagemagick Tools
Obtain the source distribution from the website of Imagemagick:

$ sudo -i
$ cd /usr/local/src
$ wget http://www.imagemagick.org/download/ImageMagick.tar.gz
$ tar -xzvf ImageMagick.tar.gz
$ cd ImageMagick-6.8.8-4
$ ./configure
$ make
$ make install
$ ldconfig /usr/local/lib
The last command is to configure the dynamic linker run-time bindings.

If everything went well, change back to the ordinary user account ("$ exit"), change directory to "/home/scratch", and fire a command line like:
$ display test_landsat_truecolor.png
And you should see a landsat image of the San Francisco Bay Area.

[Additional Notes:]
1. If you run into problems complaining about missing libraries, a good place to check first is "$ yum list | grep library-name" because many common libraries are pre-packaged by the Amazon AMIs.
2. Use "pkg-config --list-all" to check what libraries have been installed in your system.
3. Check if the environment variables PATH, PKG_CONFIG_PATH, and LD_LIBRARY_PATH are correctly set.

+Revision History