Code versioning with Git

References

As this Wiki doe not go too much beyond basics, here are some good references and examples for you to dig deeper and try out:

  1. The online version of the Pro Git book by Scott Chacon.
  2. The online Git hands-on tutorial - a great way to try things out
  3. Nice and quick online videos
  4. Git command reference

What is Git

Git is a version control system that can help you keep track of your code and its versions. It works both on local and distributed way, so while you can use it to actively collaborate on software development with your team, you can also do it for your own code development. With Git, you won't have to create copies of files and directories just because you want to start working on a new, but rather keep it all in one place with the ability to move between easily between different versions and even restoring previous versions if you need.

There are number of reasons why Git is very popular tool for version control:

  1. Almost all operations are local = you can use it all the time without the need to be connected to the Internet.
  2. All files under Git control are check-summed, so Git knows right away if anything changes - great for integrity.
  3. Git mostly only adds data, so almost everything is recoverable - think MS Word with infinite Undo

Brief Introduction

Each file under Git version control can be in one of three possible states:

  1. Committed = file is in your local database being tracked by Git
  2. Staged = you have modified the file and and marked it for Git to take the file's snapshot during the next commit
  3. Modified = you have modified the file, but have not yet marked it for inclusion in the next commit

Basic Git Workflow

  1. You work on some files in your project
  2. You mark modified files as "staged" = they will go into the next commit
  3. Commit the files (their current snapshot) to the local database

You should think about the commit more like hitting the "Save" button in MS Word, rather than a new version release (even though you can of course mark the commit as a version release as well)

Here is a link to much more detailed description (with pictures) of the Git basics - it is very short and it will give you a great understanding how Git works.

Getting Started

Git is already installed at NAS and you can use it on both the Pleiades system (pfe and bridge nodes) and on the NEX sandbox. On the bridge/pfe nodes, you need to load Git from the modules. For example:

module load git/1.7.7.4
There are currently several versions of Git maintained on the system. More information on modules is available on the NEX Loadable Modules wiki and on the HECC Knowledge Base

Initializing Git repository for your project

Let's say you have a bunch of code files under /nex/src/cool_project. In order to get this under Git control, just do:

linux:>cd /nex/src/cool_project
linux:>git init 

or:

linux:>git init /nex/src/cool_project

(Both of the above do basically the same thing - the second one also creates the directory if it doesn't exist yet)

That's it - done!

More details on Creating or Cloning a Git Repository

Registering files with your repository

At this point, you have told Git that this is where the files fo r the CoolProject live, but you have not asked Git to track them yet.

To do that:

linux:>cd /nex/src/cool_project
linux:>git add . 

Now all files under this directory and it's subdirectories will be added to the database on the next commit. You may not always want to track all of your files (such as artifacts from your builds = .o, libraries .a, temporary files used by your editors etc.). There is a way to get around it in Git by either setting up .gitignore file with the appropriate patters or adding it to your your-project-dir/.git/info/exclude

Example of .ignore file that ignores object files (.o) and libraries (.a) as well as emacs temporary files, which usually end with ~:

linux:>cat .ignorefile
*.[oa]
*~

Additional details on ignoring files

Committing changes into Git database

linux:>git commit

This will open an editor (vi/m) on bridge/pfe nodes that will ask you to add a comment. You can also do it directly from the command line without opening the editor:

linux:>git commit -m "Initial commit"

Recording changes

Let's say you have worked for a while and want to save your changes to the repository. Remember the three states that you files can be in - they are currently in the modified state. If you want to see what is the status of your repository and what you need to do, you can find out by doing git status:

linux:>git status
 ...
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   cool/cool1.txt
#       modified:   project/project1.txt
#

Notice, how Git tells you what you should do next or what you should do to discard the changes.

Now to put files in the staged state, you can do one of two things (assuming you are in the top level of your project directory):

linux:>git add cool/cool1.txt
or:
linux:>git add .

Now your files are staged and you can go ahead and commit it to the database:

git commit -m "Update v1.0"

While the above really helps you control the staging and the commits in good details, there is a shortcut that you can use to bypass the staging step and go directly to commit:

git commit -a -m "Update v1.0"

The above will take any modified file that you have instructed Git to track and will commit the file snapshots into the database. Think about it like a giant "Save All" button for all the modified files in the repository

Here are much more details on recording changes in your Git repository

The above is the description of the basic commands that can get you started. We will cover ways in which you can use Git to create different code branches and collaborate on code with others in distributed way through remote repositories in other Wiki's.

If you want to go ahead and explore on your own - the Pro Git online book is a great reference