Mount OpenNEX Amazon Public S3 Buckets on an Amazon EC2 Instance

This wiki shows the basic steps on how to mount a public Amazon S3 bucket (in our case the NEX datasets stored on S3 and available at s3://nasanex/) on an Amazon EC2 Instance.


This wiki assumes that a user has basic knowledge on
1) how to set up an Amazon Machine Image (AMI) via the Amazon Web Services (AWS) Management Console?
2) how to create appropriate security credentials (i.e. Amazon EC2 Key Pairs) in order to establish a secure SSH connection to a created instance.
3) having in record the AWS Access Key ID with the Secret Access Key (this is in general not required for this tutorial but will be useful for other purposes).

Basic Prerequisites:
1) An Amazon AMI (for starters, please use the Amazon free usage tier configurations while booting an instance) with Ubuntu Server 12.04.3 (Long Term Storage) LTS 64-bit with a t1.micro instance type.
2) An Amazon S3 bucket (in this case NEX S3 bucket located at s3://nasanex)
Note: The NEX S3 bucket resides in the US West (Oregon) Region.
3) FUSE - With FUSE, it is possible to implement a fully functional filesystem in a userspace program.
4) S3fs - S3fs is a FUSE-based file system backed by Amazon S3. It can be used to mount a bucket as a local file system read/write and can also be used to store files/folders natively and transparently.
5) An SSH client (e.g. "putty" in windows/unix or "terminal" in mac) to connect to your EC2 instance.
6) A basic knowledge about Linux commands.

[Additional Notes]
a) AWS provides a step-by-step user guide that walks through the processes from signing up for AWS account to launching an EC2 instance.
b) By default the AWS user guide selects an 64-bit Amazon Linux AMI (Amazon Machine Image) to create the EC2 instance, though one is free to choose from other Linux AMIs (e.g., an Ubuntu AMI). Because different Linux systems sometimes use different software packages and tools, the system configuration command lines may be different between Amazon Linux and Ubuntu (or other) AMIs.

Follow the steps below to perform a successful mount.

Step 1: Using any of your preferred SSH client, login to your Amazon EC2 instance (use ubuntu as your username) using your private key and Public DNS address.
e.g. $ ssh -i foobar.pem ubuntu@ec2-XX-YYY-ZZZ-UUU.us-west-2.compute.amazonaws.com

Step 2: To keep your sudo privileges active throughout the session, type
$ sudo -i
At this point, you may want to update the local package index with the latest changes made in the repositories by typing the following:
$ apt-get update
Step 3: Install required packages and related dependencies by typing the following:
$ apt-get install make gcc g++ curl libxml2 libxml2-dev libssl-dev libcurl3 libcurl4-gnutls-dev openssl pkg-config

[Additional Notes]
a) For Amazon Linux AMIs, using the following:
$ yum update
$ yum install make gcc-c++ libxml2 libxml2-devel curl libcurl libcurl-devel openssl openssl-devel pkg-config
b) The “-devel” versions of the libraries (e.g., libxml2-devel) are necessary.
c) Use “yum list [options]” to list the libraries/packages that come with the Amazon Linux AMI and choose the right versions.e.g.,"$ yum list | grep curl" will give a list of different versions of libcurl. Incorrect versions (e.g., if you are not using the “-devel” version) can cause problems later on.
d) Fire a “$ pkg-config --list-all” command to see if it finds the just-installed libraries. If not, it may be that the $PKG_CONFIG_PATH environment variable was not properly set. The path can be manually set by
$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig.
Other common "pkgconfig" paths can also be added to the above command line.

Step 4: Download, make and install FUSE and S3fs packages. It is recommended to check the latest version for download. In our case, we will be downloading FUSE version 2.9.3 and S3fs version 1.74.
lets install FUSE and S3fs at /usr/local/src directory
$ cd /usr/local/src
download fuse from source website:
$ wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.9.3/fuse-2.9.3.tar.gz
untag the downloaded archive
$ tar -xvzf fuse-2.9.3.tar.gz
cd to path to extracted file
$ cd fuse-2.9.3
configure the script and install
$ ./configure --prefix=/usr
$ make
$ make install
add the FUSE module to Ubuntu kernel
$ modprobe fuse
to check if you have the latest version of FUSE installed, type
$ pkg-config --modversion fuse

To install S3fs, follow similar steps as above:
$ cd /usr/local/src
download s3fs from source website:
$ wget https://s3fs.googlecode.com/files/s3fs-1.74.tar.gz
untag the downloaded archive
$ tar -xvzf s3fs-1.74.tar.gz
cd to path to extracted file
$ cd s3fs-1.74
configure the script and install
$ ./configure --prefix=/usr
$ make
$ make install

[Additional Notes]
One can run into problems with FUSE and S3FS if 1) the libraries listed in Step 3 were not correctly installed, or 2) the PKG_CONFIG_PATH wasn't set right. A way to check the problem is to (for the example of FUSE):
a) Try finding the library files “libfuse.*” in "/usr/lib" or "/usr/lib64". If they cannot be found, they need to be re-installed.
b) Try finding a “fuse.pc” file under the “pkgconfig” subdirectory under the lib/lib64 directories. If it is not there, re-install the correct “-devel” version of the libraries
c) Try finding the library information using "$pkg-config --exists libname". If not, the PKG_CONFIG_PATH wasn’t correctly set.


If all fails while invoking the command $ pkg-config --modversion fuse and it does not show the right version and displays warning about library path settings, the easiest way is to fire up the following commands:
$ yum install fuse
$ yum install fuse-devel
$ pkg-config --modversion fuse
this should now show the right version of fuse installed.

We have now successfully installed all required packages. The next thing we need to do is actually mount the S3 public bucket as a repository in our OS.
Note: S3fs needs a passed-s3fs configuration file with access key ID and secret access key of you AWS account for mounting private buckets, however since we are trying to mount the NEX public bucket, we need not set the credentials for now.

Step 5: Change a small definition to allow users to access the mounts (step 6) in /etc/fuse.conf file:
$ vim /etc/fuse.conf
Uncomment the line where it says #user_allow_other.
To uncomment, just remove the "#" before user_allow_other and save/quit vi editor by hitting escape key and typing "wq".

[Additional Notes]
In case a “/etc/fuse.conf” cannot be found, create one by copying/pasting the following:

# Set the maximum number of FUSE mounts allowed to non-root users.
# The default is 1000.
#
#mount_max = 1000

# Allow non-root users to specify the 'allow_other' or 'allow_root'
# mount options.
#
user_allow_other

Note that on the last line “user_allow_other” is uncommented.

Step 6: Mount the opennex MODIS S3 bucket
lets mount the directory to /mnt/s3-modis
$ mkdir /mnt/s3-modis
execute s3fs to mount the opennex MODIS bucket into this newly created directory

$ /usr/bin/s3fs -o default_acl='public_read' nasanex:/MODIS /mnt/s3-modis/ -o public_bucket=1

here nasanex is the bucket name and /MODIS is the MODIS data that we want to mount.
To know more about the s3fs syntax, visit here
you can check if the mount was successful by running
$ df -h
this will show something like this
s3fs 256T 0 256T 0% /mnt/s3-modis
you can now access the data at $ cd /mnt/s3-modis
the directory will have two sub-directories namely /MOLA and /MOLT. MOLA is Aqua MODIS MOD13Q1 and MOLT is Terra MODIS MOD13Q1.

If you want to avoid doing a manual mount everytime you log in and want to do a more automated mount, you can do the following by placing an entry in /etc/fstab for root mount at boot. To do this, type:
$ vi /etc/fstab
append the file with a second line that has the exact syntax as below:
s3fs#nasanex:/MODIS /mnt/s3-modis fuse _netdev,allow_other,public_bucket=1 0 0
hit escape and save by typing :wq
you can now reboot your instance and start using your auto mounts for analysis and science research.
Note: you can always unmount your drive by typing `umount /mnt/s3-modis'

[Additional Notes]
Remember to “$sudo -i” after logging in to access the automated NEX data volume.
Unsolved Question: Can one access the mounted S3 data as an ordinary user (i.e., ec2-user)?

+Revision History