versioning_data_scripts


Previous: Introduction to Version Control


In this lesson the focus will be on gaining an understanding of the basic aims and principles of Version Control by working with a plain text document using Git (GitKraken & GitHub).

Getting Started with Git using a GUI (Graphical User Interface)

Usually when programmers use Git for version control of their code, they use the command-line user interface, i.e. UNIX/Linux, to interact with Git. However, there are several tools that enable the use of Git easily for novices using a Graphical User Interface (GUI). Two examples of GUIs are GitHub Desktop and GitKraken. Although there are advantages to using the command line version of Git in the long run, a GUI is a great place to start.

A Note on Terminology

One of the trickier aspects of using GitHub is the new terminology (repository, add, commit, pull, push, remote, detached head). Some of the commands/terms are fairly self-explanatory, others less so, and in this workshop you will encounter some of these. Here is a glossary of associated terms, however it is best to pick up terminology wile learning how to use GitHub.

Register for a GitHub Account

Since we are going to be using GitHub we will need to register for an account at GitHub if we don’t already have one. For students and researchers GitHub does offer free private repositories, these are not necessary but might be appealing if you want to keep some work private to you or a specified set of users.

Install GitKraken

Most of you should have already installed GitKraken. Open it, and sign in using the credentials you used to sign up for a github account.

Once you sign in, GitKraken will take you to it’s Welcome screen. At this point, you are ready to start working with a repository.

Version Controlling a directory of files

Creating a Repository

Git tracks the contents of a folder by creating a repository in a given folder; so it is important to organize projects in folders.

Tracking items in a folder (repository) using Git:

Download the folder we have generated for this session from here, and unzip it in a location of your choosing.

Creating a Folder/Repository

There are a number of different ways to add files/folders for GitKraken to track. For this lesson, click on the folder icon at the top left corner. This will allow you to either Open an existing repository, or Clone a repository that you or someone else has created, or Init (initialize/create) a new repository. Today, we will be initializing a repository.

Click on Init, and then GitHub.com, so that we can create a repository that we will keep locally, as well at a remote location as a backup or perhaps for sharing:

Fill in the fields as appropriate:

Voila! You now have your first Git repo!

Once we have added our folder we will be able to see it in a list of repositories on the left column.

We’ll point out a few features here:

Since we’ll now want to add more files to this repository, right-mouse click on the README.md file and select Show in Finder (Show in Explorer) from the pop-up menu:

The folder we created the repsitory with now contains an extra folder with the name ‘.git’ (this is a hidden folder). This folder is how GitKraken will track changes (adding files/folders, modifying existing ones, deleting files/folders) we make within our version controlled folder:

Staging and Committing Changes

We need to copy in our sample files that you’ve downloaded. Open up that folder, and copy/move those files here. Your window should look something like this:

When we switch back to GitKraken, you’ll notice the timeline window at the top has changed. GitKraken has noticed files have changes, and it’s indicated this new set of changes is considered Work in Progress:

Click on the WIP line at the top to show the files it is watching, show in the bottom pane. You can resize this panel to show all the files if you desire:

A commit tells Git that you made some changes which you want to record. Though a commit seems similar to saving a file, there are different aims behind ‘committing’ changes compared to saving changes. Commits take a snapshot of the file at that point and allow you to document information about the changes made to the document.

We next need to tell Git that we wish to prep these files for a commit, what we call an initial commit, when we take a snapshot of the files at the start of our work and any tracking that we wish to do. To include these files for a commit, we Stage the changes by clicking on the ‘Stage all changes’ button:

You do have the option of adding only certain files to the Staging area if you wish to make separate commits. Simply click on the work Stage that appears near the files you wish to include.

To commit changes you must give a summary of the changes, include an optional message, and click on the Commit button:

After the commit, the timeline changes to reflect the current state & history of our repository. Clicking on the top line, our recent commit, shows in the bottom pane the changes that were include, which is the addition (green plus square) of these files:

A useful way to think about commits is as the ‘history’ of your project. Each commit records a development or change made to the documents in your repository; the history of the project can be traced by looking at all of the commits.

Note about Branches:

When you commit you will see ‘commit to master’. This refers to the master branch.

Within a Git repository it is possible to have multiple ‘branches’. These different branches are essentially different places in which to work. Often they are used to test new ideas or work on a particular feature without modifying or “contaminating” the master copy (e.g. production version of a webpage). This feature is very useful when collaborating with others. We do not have time to go into this aspect of Version Control today, but we encourage you to explore it further.

Changing File Contents and Committing Changes

Let’s open the mars.txt document using our favorite text editor (see note below about text editors) and add a couple of lines to it.

Mars is a red planet.
It is cold and dry, but everything is my favorite color.

The two moons may make things interesting

Save the changes to your file and go back to GitKraken. Again, the program creates a new WIP timeline entry as it has detected changes. Click on this line to show that GitKraken has noticed that our file has changed (file icon with an elipsis inside):

When you click on the filename, you will see that these new lines of text appear; this lets us know that Git is able to see changes in your file but at the moment these changes haven’t been recorded in an official ‘snapshot’ of your repository. To do this we need to add and commit our changes, just as we did before.

Text Editors:

When creating a plain text document, you will want to use a text editor like TextWrangler/Sublime Text (Mac) or NotePad++ (Windows) instead of Microscoft Word or the default text editors. You will also want to make sure that you save it as plain text. There are a large number of free and paid text editors available to choose from.

In the context of GitKraken when you stage your changes, it is similar to the add command on the command line. You can “add” several changes in the staging area, and only commit when you are ready.

As we did with our previous initial commit, include a change message, and click on the Commit button:

Again, you’ll see our timeline has changed to include this commit:


Exercise #1

  1. Create a repository “learning_github” in GitKraken. Make sure to create it both locally, and remotely on github.com.
  2. Find the folder on your local computer, and add a couple of small text files to it from your computer.
  3. Create a new plain text file called “data-file.txt”, add a line or 2 of content to it and save it to the “learning_github” folder.
  4. Go to GitKraken, and commit the change with an approriate message.
  5. Switch repos back to class repo.

Pushing Your Changes to Your Remote Repository

At the moment we are only recording our changes locally, but we may want to have these changes be available remotely as well (for collaborating/sharing/backing up). The idea is you keep your local and remote repositories “in sync”.

This is straightforward in GitKraken and you do it by doing a one-way synchronization of your repository to the remote that you linked it to when you first created the repo. This one-way synchronization will push your repository from your computer to the GitHub website, and populate the remote repository on GitHub’s servers in the process.

We can now view our changes on our remote at GitHub.com. If the left pane, our remote is given the name ‘origin’, which is the default term for the remote repository in Git (note that you can call it whatever you’d like, and you can have more than one remote! But that is beyond the scope of this lesson.) If we then right-mouse click on our ‘origin’, we can select the pop-up menu option “View origin on GitHub.com”:

Indeed, GitKraken sends us to our web browser and our repository on GitHub.com is displayed:

You can also have a fully local repository, without a remote “synced” one on GitHub. If you would like to initialize such a repository with this intention pick the “Local Only” option under “Init”.


Exercise #2

  1. Push the changes to the “learning_github” repo (from the preivous exercise) to the remote repo on github.com
  2. Make changes to data-file.txt on GitHub.com
  3. Sync or “Pull” the changes that were made remotely to the local repository

Next: Remote repositories, managing conflicts