Version Control with Git and GitHub

Version Control on the Grid using Git and GitHub

Sections:

Introduction

We strongly encourage people to use some form of version control as a part of best practices for research data management, reproducible research, and documenting her/his scripts & code changes over the course of a research project (see Other Resources on our Data Practices page). Git and SVN, two different forms of version control, are installed on the Grid. 

Git is the program that will track changes of your scripts (source code) over time. And Git can also be used to track changes in text documents (HTML pages, text files, Markdown documents, etc) and binary files (MS Office files, images, etc), though the changes in these types of files are tracked less efficiently (and is beyond the scope of this document). Git should not be confused with GitHub, which is a hosting service for Git repositories, the bundle of files and folders that represent your work and the typically-invisible data files that track your changes & history. Other hosting services include GitLab, SourceForge, and BitBucket.

If you'd like general information on how to use Git, please see the excellent, self-paced materials for Git at the command line from Software Carpentry. For a GUI version of Git, we recommend GitKraken, and our own GitKraken usage tutorial.

We will demonstrate how to create a new repository locally (on the compute grid), to connect your local repository to the remote GitHub hosting service, and to clone an existing remote repository on GitHub to the compute grid (locally).

Copy (clone) an existing Git repository

You've created or found a repository on Github and you now wish to have this . Follow the steps below to copy, or termed clone, this repository locally on the compute grid:

  1. Navigate on GitHub to the repository you wish to clone
  2. Click on the Clone or Download button, select the SSH option so the small window says "Clone with SSH", and click on the clipboard icon to the right:

    Clone with SSH from GitHub
  3. Back in your account on the compute grid, enter the following command to download the repo as a clone:
    git clone git@github.com:tidyverse/dplyr.git
    
  4. If you wish to name the directory differently that the repo on GitHub, enter that directory name after your command, similar to:
    git clone git@github.com:tidyverse/dplyr.git my_dplyr
    

Start a new Git repository

Navigate to the location in your home folder or project space, usually where you would like to keep your code, (optionally) create the directory, and then initialize the repository:

	cd ~/git
	mkdir my_dplyr
	git init

You are now ready to copy any script files here, or start scripting and committing your changes.

To connect your existing local repository to a remote on GitHub

Connecting your local repository to a remote enables you to have an offsite backup of your code. If your GitHub repository is public, then setting up this option and 'pushing' to the remote enables you to share your work with others. You may connect your local repo to GitHub either by username/password or by SSH key. The SSH key method ties the specific local computer to your Github account and bypasses the need for usernames & passwords. Only a few additional steps are needed that are straightforward.

  1. Please follow the Generating a new SSH key... directions on Github.com. Only the first section needs to be completed.
  2. Next you'll need to add the SSH key that you've just generated to your Github account. Please follow the directions at Adding a new SSH key to your GitHub account.
  3. We always recommend that you test your connections. Helpful instructions can be found at Testing your SSH connection.

With your SSH key set up, you will need a GitHub repository (repo) to connect to. If you need to create one, use the following instructions. Otherwise, skip ahead to the bottom section.

  1. Log in to GitHub.com
  2. In the upper right, next to your account picture, click on the + dropdown
  3. Select New repository

    git on grid example
  4. Unselect the option for 'Initialize this repository with a README', and click on the Create Repository button.
  5. On the page that appears, copy the two lines near the bottom in the section "…or push an existing repository from the command line" 
  6. In your terminal window, while in the directory for your repo, enter the copied commands. They should be something like:
    	git remote add origin git@github.jharvard/my_dplyr.git
    	git push -u origin master
    

OR, If you already have a Git repo set up:

  1. Follow the 'Copy (clone) an existing Git repository' from above, copying the SSH URL to your clipboard.
  2. In your terminal window, while in the directory for your repo, enter the command, pasting in the SSH URL. They should be something like:
    	git remote add origin (paste URL)
    	git push -u origin master
    

These commands will link your local repo to the remote repo on GitHub, and then push your commits from your local machine to your Github remote.

Congrats!

Updated 6/21/18