What is source control? And more importantly, why should you care?
Source control (also known as revision control, or version control) is a way to keep track of files: what changes were made, who made them, and when they were made. But source control can be much more than that – it can also provide a systematic way for you to experiment with files and share them with others. In this post I'll go over the basics of using one type of source control (git) and talk a little about how researchers in particular can benefit from source control.
Git, SVN, and Mercurial are three programs which are commonly used for source control in the tech world right now. I'm going to restrict my post to git because it's what I know and use. To me, the best thing about git is that it has a really shallow learning curve at the beginning, but it's also capable of some very complicated maneuvers if you take the time to learn more. Not only is it free and open source, it's also a useful thing to learn since many companies and professional programmers use it. I like git a lot, and I hope you will, too!
Installation
Anyway, speaking of that shallow learning curve, git is mercifully easy to install. If you are using a package manager, you can install git using a command such as sudo brew install git for Mac, or, apt-get install git-core for Linux. (If the words "package manager" and/or "command line" are unfamiliar to you, you'll probably want to check out my earlier post on how to learn programming from scratch.) If you want something even easier (and probably more up to date, actually), you can download a package here that will install git for you automagically.
As an aside, there is also a GUI (Graphical User Interface, or basically an "App") available through github. As with many other things programming related, there are pros and cons to both the CLI (Command Line Interface) and the GUI. Which one you decide to use is really up to you, dependent on your background and needs, with one caveat: there is more support for the CLI, since that's what most people use. If at some point you search the internet for help with git, you'll probably find that there are a lot more people who can help you find the right command than can help you find the right menu button. I haven't used the GUI very much myself, but based on my limited experience I would suggest using the command line interface instead if you can because it's a useful thing to learn.
After you install git, take a few minutes to customize your git environment. If you haven't already, follow the instructions in my previous post on how to change your default command line text editor. Trust me on this – you do not want to use vi. You can also add your name and email address, especially if you intend to use github to share your code. To change your git config settings, replace the example name and email in the two commands below with your name and email (include the quotation marks in your command, but not the dollar signs):
$ git config --global user.name "Jane Doe"
$ git config --global user.email janedoe@example.comAnother nice customization is adding color to your git environment, which makes git output much easier to read. To do this, type:
$ git config --global color.ui autoTo find out more about git configuration, you can also check out the documentation on the git website.
How to create a repository
Ok, now that you have git installed, let's do an example. Suppose you are given someone else's code and you are told to modify it in some way. Since you don't really know how this code works yet, it would be nice to keep track of the original state of the code when it was given to you. If (when) you have to debug the code in the future, you can look back and try to see what changed between your current version and the original working version you started with.
So let's create a git repository: open up the command line and use the "cd" command to change your current directory to where the new code is located. For example, if your new code is in Documents > programming > mycode, you type:
$ cd ~/Documents/programming/mycodeNow, to create a repository we simply type:
$ git initThe next step is to add all the files you want to track to your git repository. If you want specific files (for example, file1.txt and file2.txt) you can type in the names of each of the files manually, separated by spaces like this:
$ git add file1.txt file2.txtIf instead you just want to add all the files in that folder, you can just type:
$ git add *(The * notation is actually from something called regular expressions, which is incredibly useful, but I don't have time to talk about it here.)
Finally, the last step is to save this current state of the code in memory. To do this, we "commit" the files that we just added. A commit is like a save point that we can access later if needed. When you commit code to memory, you should also add a commit message describing what changes you made since the last commit. Commit messages are the most important and most useful aspect of source control. If you fail to write thorough and informative commit messages, at some point you'll be forced to dig through your code to try to figure out what you did and why, and you probably won't save any time compared to not using source control at all.
Useful commit messages include a verbal description of the specific problems that were fixed. For example, "Fixed bugs, updated parameters" is not a useful commit message. Instead, the message should be much more specific, like this:
"Fixed off-by-one error in file 2, added escape key to for-loop in file 1, changed initial conditions from 3 to 2 to improve performance."Remember, your most important collaborator is your future self! If you write a thorough commit message now, you'll thank yourself later.
Ok, let's commit this code. It's as simple as:
$ git commitAfter you type this command you'll be presented with the commit file in your default command line text editor. At the bottom of the file is a bunch of text telling you which files were changed and what the changes were, line-by-line. At the top of the file there is a blank section where you can type your commit message. You might write something like "First commit, initial state of the code", then save and exit. That's it! Git is now keeping track of all your changes from now on.
Viewing your tracked changes
So how do we access this tracking information? First, let's make sure everything is working correctly. To check the status of your git repository, type the command:
$ git statusYou should get an output that says something like "nothing to commit, working directory clean". Now let's edit one of the files that we added earlier and see what happens. For example, make a small change to file1.txt, save the file, and then call git status again. You should see something like this:
Changes not staged for commit:(use "git add <file>..." to update what will be commited)(use "git checkout -- <file>..." to discard changes in working directory)This tells us that since the last time we committed, changes have only been made in a single file, file1.txt. To find out what was changed, we can type:
modified: file1.txtno changes added to commit (use "git add" and/or "git commit -a")
$ git diffWe will see some text showing the name of the file and a few lines before and after the spot that was changed. The change itself will be highlighted with plus signs (+) where text was added, and minus signs (-) where text was deleted. (By the way, any time the output for git is longer than a page, you'll see a colon (:) at the bottom of the terminal window instead of the usual dollar sign ($). To scroll, use the up and down arrows. To quit and return to the command line, press the q key.)
Ok, let's try one more thing. What if we wanted to look at the difference between the current state of the code and an even older version of the code from an earlier commit? For our current example we first need to have more than one commit. Go ahead and commit your changes to file1.txt, including a commit message that describes what you changed.
Next, we need to choose the older commit we want to compare against. To view older commits, we use this command:
$ git logYou'll see a list of two commits, each followed by a hash (this is basically a code that allows you to uniquely identify that commit). You should also see your name and email (since you made the commits), the date and time of the commit, and the commit message. Highlight and copy the hash of the initial commit we made earlier, then return to the command line and type the following command (replace HASHCODE with the hash that you just copied):
$ git diff HASHCODEVoila! You can now view all the changes since that first commit.
How can source control help me?
Now that you understand the basics of how git works, let's talk a little about what it can do. Even for an individual, git can be very useful. There are many practical benefits, like easy recovery for accidental code and file deletion, and the ability to quickly and easily roll back code to a previous state. (In my next post I'll also talk about branching, which allows you to quickly and easily switch between variations of your code.)
Additionally, I've already mentioned that git can be helpful for finding the source of errors when debugging. This is particularly useful if you make notes in your commit messages of which commits have been tested and confirmed to be working. If the code stops working later on, you know that it must be something you did between now and your most recent working commit, so there are fewer edits to sift through to find the error.
Finally, one more benefit of tracking changes with git is that you have a record of what you've accomplished over time. Commit logs show how your time was spent and describe in detail exactly how you solved the challenging problems you faced along the way. If you spend a lot of time working on your code alone and without recognition, as many researchers do, using git can be really encouraging!
Stay tuned for part 2: merging and sharing code
No comments:
Post a Comment