Landsat SQLite on bridge nodes

Introduction

There is a temporary SQLite database that is accessible from the bridge nodes while we are bringing up MySQL, so that users can have a little better access to the Landsat data holdings.

SQLite provides basic SQL interface and is accessible from number of languages and more extensive documentation can be found here: http://www.sqlite.org/.

Database Schemas

There are currently two main tables in the database, which contain links to the Landsat files on line and some additional metadata information.

The "landsat" table is created by inserting existing files into the database pariodically.

TABLE landsat(
   granule_id text,        #This is the primary key a unique Landsat scene ID
   noloc_id text,          #granule_id with the gsid stripped 
   sensor text,            #Which sensor (T or E) = (TM or ETM)
   satellite int,          #4, 5, or 7
   path int,
   row int,
   url text,               #The path to the file on the filesystem
   acq_date date,          #Acquisition date (by the satellite) = overpass date
   year int,               #Year of data acquisition (overpass)
   doy int,                #Day-of-year of acquisition (overpass) 1-366
   gsid text,              #Ground station id 
   version int,            #Processing version
   is_local int,           #1 = online, 0 = in the backup system
   UNIQUE(granule_id) ) 

The "meta" table is created from parsed from bulk metadata obtained periodically from the USGS. More information on the metadata is in the Landsat Data Dictionary, but since this describes the overall internal metadata structure of the datasets, there is quite a bit more information than we have in the DB right now. The schema of the external metadata that our "meta" table is extracted from can be found here.

TABLE meta(
   scene_id VARCHAR(200),  #Matches the granule_id in the landsat table
   sensor text, 
   satellite int, 
   path int, 
   row int,
   ul_lat float, 
   ul_lon float, 
   ur_lat float, 
   ur_lon float, 
   ll_lat float, 
   ll_lon float, 
   lr_lat float, 
   lr_lon float, 
   center_lat float, 
   center_lon float,
   acq_date date, 
   processed_date date,    #When this scene was last processed at USGS
   year int,               #Year of data acquisition (overpass)
   doy int,                #Day-of-year of acquisition (overpass) 1-366
   cloud_cover int,        #Cloud cover interval 0-9 (0 = 0 to 10%, 1 = 10 to 20%,...)
   cloud_cover_full float, #Cloud cover % for this scene (0.0 - 100.0) - not always available
   l1type VARCHAR(10),     #L1G, L1GT, L1Gt, L1T, PR
   image_quality1 int,     #For L7 - image quality for bands 1-6l (0=bad, 9=good)     
   image_quality2 int,     #For L7 - image quality for bands 6h, 7, 8 (0=bad, 9=good)
   acquisition_quality int,# 0=missing, 1=extremely poor, 9=excellent
   quality_band1 int,      # 0=worst, 9=excellent
   quality_band2 int,      # 0=worst, 9=excellent
   quality_band3 int,      # 0=worst, 9=excellent
   quality_band4 int,      # 0=worst, 9=excellent
   quality_band5 int,      # 0=worst, 9=excellent
   quality_band6 int,      # 0=worst, 9=excellent
   quality_band7 int,      # 0=worst, 9=excellent
   UNIQUE(scene_id) ) 

Accessing the SQLite database

There are two basic ways you can interact with the database. Directly from the command line and from your code through the SQLite API. To do either you need to know the location of the database file which is in:

/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db

Access from the command-line

So from command line you can just do (assuming you are on bridge4):

bridge4> sqlite3 /nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db

You will get the sqlite prompt and you can just type your queries in there. For more information, see the SQLite documentation.

Access from Python

To access the database from python, you will need to import the sqlite3 module, open the connection to the SQLite db and establish the cursor (a structure that will deal with the results as they are returned from the database).

import sqlite3 as sqlite;

sqlf = '/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db'; #location of the db file
conn = sqlite.connect(sqlf)
curs = conn.cursor();

... some queries and result manipulation

curs.close();
conn.close();

And you are ready to run some queries.

Example queries from Python

Find all the Landsat7 data for 2009, path 34 and row 40:

import sqlite3 as sqlite;

sqlf = '/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db';
conn = sqlite.connect(sqlf)
curs = conn.cursor();

results = None;
try:
   curs.execute("SELECT url FROM landsat WHERE path=? AND row=? AND year=? AND satellite=7", 
               (34, 40, 2009));
   results = curs.fetchall();
   for r in results:
      print r;

except sqlite.Error as e:
   print "An error occurred:", e.args[0];

curs.close();
conn.close();

Query across the two tables to find data with less than 30% cloud cover

...
curs.execute("SELECT url FROM landsat INNER JOIN meta ON granule_id = scene_id WHERE \
              meta.cloud_cover < 3 AND landsat.path=? AND landsat.row=? AND landsat.year=? AND \
              landsat.satellite=7", (34, 40, 2009));

results = curs.fetchall();

...

Landsat SQLite on bridge nodes (old revision)

Edited Dec 15, 2012 by NDC-pvotava

Introduction

There is a temporary SQLite database that is accessible from the bridge nodes while we are bringing up MySQL, so that users can have a little better access to the Landsat data holdings.

SQLite provides basic SQL interface and is accessible from number of languages and more extensive documentation can be found here: http://www.sqlite.org/.

Database Schemas

There are currently two main tables in the database, which contain links to the Landsat files on line and some additional metadata information.

The "landsat" table is created by inserting existing files into the database pariodically.

TABLE landsat(
   granule_id text,        #This is the primary key a unique Landsat scene ID
   noloc_id text,          #granule_id with the gsid stripped 
   sensor text,            #Which sensor (T or E) = (TM or ETM)
   satellite int,          #4, 5, or 7
   path int,
   row int,
   url text,               #The path to the file on the filesystem
   acq_date date,          #Acquisition date (by the satellite) = overpass date
   gsid text,              #Ground station id 
   version int,            #Processing version
   is_local int,           #1 = online, 0 = in the backup system
   UNIQUE(granule_id) ) 

The "meta" table is created from parsed from bulk metadata obtained periodically from the USGS. Some more information on some of the metadata is here, but since this describes the overall internal metadata structure of the datasets, there is quite a bit more information than we have in the DB right now. The schema of the external metadata that our "meta" table is extracted from can be found here.

TABLE meta(
   scene_id VARCHAR(200),  #Matches the granule_id in the landsat table
   sensor text, 
   satellite int, 
   path int, 
   row int,
   ul_lat float, 
   ul_lon float, 
   ur_lat float, 
   ur_lon float, 
   ll_lat float, 
   ll_lon float, 
   lr_lat float, 
   lr_lon float, 
   center_lat float, 
   center_lon float,
   acq_date date, 
   processed_date date,    #When this scene was last processed at USGS
   year int,               #Year of data acquisition (overpass)
   doy int,                #Day-of-year of acquisition (overpass) 1-366
   cloud_cover int,        #Cloud cover interval 0-9 (0 = 0 to 10%, 1 = 10 to 20%,...)
   cloud_cover_full float, #Cloud cover % for this scene (0.0 - 100.0) - not always available
   l1type VARCHAR(10),     #L1G, L1GT, L1Gt, L1T, PR
   UNIQUE(scene_id) ) 

Accessing the SQLite database

There are two basic ways you can interact with the database. Directly from the command line and from your code through the SQLite API. To do either you need to know the location of the database file which is in:

/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db

Access from the command-line

So from command line you can just do (assuming you are on bridge4):

bridge4> sqlite3 /nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db

You will get the sqlite prompt and you can just type your queries in there. For more information, see the SQLite documentation.

Access from Python

To access the database from python, you will need to import the sqlite3 module, open the connection to the SQLite db and establish the cursor (a structure that will deal with the results as they are returned from the database).

import sqlite3 as sqlite;

sqlf = '/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db'; #location of the db file
conn = sqlite.connect(sqlf)
curs = conn.cursor();

... some queries and result manipulation

curs.close();
conn.close();

And you are ready to run some queries.

Example queries from Python

Find all the Landsat7 data for 2009, path 34 and row 40:

import sqlite3 as sqlite;

sqlf = '/nobackupp4/datapool/landsat/rt/DB/SQLite/landsat-sqlite.db';
conn = sqlite.connect(sqlf)
curs = conn.cursor();

results = None;
try:
   curs.execute("SELECT url FROM landsat WHERE path=? AND row=? AND year=? AND satellite=7", 
               (34, 40, 2009));
   results = curs.fetchall();
   for r in results:
      print r;

except sqlite.Error as e:
   print "An error occurred:", e.args[0];

curs.close();
conn.close();

Query across the two tables to find data with less than 30% cloud cover

...
curs.execute("SELECT url FROM landsat INNER JOIN meta ON granule_id = scene_id WHERE \
              meta.cloud_cover < 3 AND landsat.path=? AND landsat.row=? AND landsat.year=? AND \
              landsat.satellite=7", (34, 40, 2009));

results = curs.fetchall();

...