At first I had hard time understanding how git or any other code versioning systems worked. I tried many of them (SVN, CVS, Mercurial, etc.), I set up the systems, read the necessary documentation, but they still did not make sense to me to use. So I kept versioning the code my own way with folders & backups, etc., which was really clumsy and occasionally caused an error if I forgot to add some changed file. But then, during one project a friend of mine encouraged me to look at git again and we agreed that I can ask him the stupid beginner’s questions I may have. And so I did.. and now, after getting answers to few questions and some terms explained to me, I cannot imagine my code-related work part without using git.
Here I want to explain how I see the git structure and “mindset” in really simple terms identifying things that made me struggle with understanding it. Hopefully this will help people like me.
I will skip the git setup part and creating the repository, since there is good enough documentation for that already. In short, I will write down questions and answers about git I would have liked to know before I started looking at it that I could not find simply explained in any of the documentation I looked at. To sum up – here is all you need to know to start working with git.
First lesson – don’t be afraid of the command line!
My first problem was that even though I use Linux command line pretty often during development though I prefer graphic user interfaces, before I started using git, I was a bit afraid of using command line for git since the impression I had was that I will need to learn about twenty new commands and their usage patterns. But when I finally got to that, it was not as scary and after trying a few git GUI tools, the command line tool seems most comfortable to me anyway. And it is really simple to use since you will need to use very few commands in your daily work with it – the best cheat-sheat I have found so far is this one here, but here are many more here.
What is a repository?
Repository is the upper level container of the code versioning – usually contains code of your whole project, but can also have parts of your code base, etc. All the rest of code grouping like branches and tags are working within one code repository. There can be several branches and several tags created and managed within one code repository.
There will always be a local code repository, but you can also have a copy of your code repository on some remote git server (like github).
What is a branch?
Branch is a collection of code which can get updated by committing a code to it. The code changes get tracked after each commit. There must be at least one branch in a repository. By default it is called “master”.
In my examples I will use a simple setup of 2 branches – “development” branch which is basically the working directory of all the development happening – all the code of development team members is commited to it. And the “production” branch which always contains the latest code available on production host. When code commited to “development” is ready for the release, the “development” branch is merged into the “production” branch and pushed to production host.
What is a tag?
Tag is a snapshot of a particular branch – tag is created on a branch. The tagged code contents cannot be altered afterwards – that is the main difference between branch and a tag. This means whenever you create a tag for particular branch and commit it, it takes a full snapshot of the current state of every file of branch’s code. That way you can retrieve that state of any code at any time. Tags are usually created to take a snapshot of whole branch between different versions, e.g., create tag for code for version 0.1, 0.2, etc. Tags can be named correspondingly like “v0.1″, “0.2″ or anything to make sense. Tag is independent of branches, but contained within one repository.
In fact, calling this feature a “tag” made it a bit confusing to me since my first impression was that you can mark cetain files with a tag to retrieve them (like tags are usually used in different other situations like tagging article, item, etc so that it can be retrieved by that tag). Only thanks to my friend I realized it was a snapshot of all the files in the branch.
Quick reference: how does code cloning, pulling, status-checking, committing and other git commands work?
The purpose of code cloning is only used to retrieve the code from some already existing off-site repository to your machine (or wherever you ar retrieving the code to) for the first time. If you create a git repository in a folder which already has the code in it, you do not need to clone anything. In any case, you use the cloning only once. After the cloning is done, to retrieve changes, you must use “git pull” which I describe next.
The purpose is to pull changes from a remote git repository branch to a local branch (that has already been created using “git branch” or cloned from a remote branch using “git clone”). Often if you have just one branch in your local repository which was cloned from a remote repository, you may just need to use the git pull without all the extra information as user/host. However, when you have several branches in your local repository or for some other reason it is unambiguous to git what host / repository / branch to pull, you may need to provide full details.
The usage is simply:
The purpose is to see which files have been changed in the working directory – in all three groups:
- tracked changed files
- not tracked, but changed or new files
- files added for commiting (using “git add” described below)
You will get an output something like this:
# On branch master
# Changed but not updated:
# (use “git add <file>…” to update what will be committed)
# (use “git checkout — <file>…” to discard changes in working directory)
# modified: .ftpquota
# modified: includes/func.inc.php
# modified: index.htm
# Untracked files:
# (use “git add <file>…” to include in what will be committed)
That way you can see which files you have pay attention to – which have been changed or created, and choose what to do with them – either add them for commiting (using “git add”), remove from tracking (using “git rm”) or just leave them alone for future.
This is the command I wish I had known about as the first when I started learning about git.
You need to add (or remove, or just ignore) every file changed or created to the list of files to-be-committed. Only AFTER you add the files to the to-be-commited list, you can do the “git commit” afterwards and have the files commited. You can use “git add .” to add ALL files that are listed as tracked and modified.
This was one other of things that stood in my way of understanding the workflow. I had a wrong perception that once you have the files edited in the working directory, you just need to run the “git commit” and all the changes will be commited without considering that there is still a phase where you need to sort out what to do with each of the changed files.
This is the command to use to remove the file from tracked files list.
Finally, this is the command to commit the changed and marked code to the local repository. Commiting the code means that you are making a record of it in some branch of your code repository. That way you can have the latest version of your code in your repository and later on revert back to previous version or see the changes that were made between different commits.
You must use this command to push changes of your local repository to a remote one (if you have it). Again – if you have recorded just one branch within your local repository and made some git pull commands, you may need to just use “git push” without providing all the rest of information.
You can use this command to see which branch you are using now by typing just “git branch” – the command to output all the branches in particular repository and marking the one you are using with a star. The output you will get will be something like this:
In this example it shows that the “dev” branch is being used. To switch to a different branch you must use the “git checkout <branch name>” command described below.
OR you can use “git branch <branch name>” command to create a new git branch.
Using this command you can switch between the branches of your repository. Switching between branches means that you are changing the state of your files between those that are last commited to the particular branch. That is also a command used to merge branches as described next.
First of all – be careful with merging branches: wrongful merging of branches may cause loss of code. I would recommend backing up both branches of code at least in first few times of merging. However, the merging process is simple (unless conflicts arise, but I am not discussing this here, because most of the merges are still successful and there is no easy answer if there is a conflict).
The workflow of branches is simple (as written in the cheat sheet mentioned before) – to merge branch B1 into B2:
1. Check out branch B2 using “git checkout B2″
2. Merge branches “git merge B1″
Similarly as with “git branch”, you can list all the tags using “git tag” or create a tag using “git tag <tag name>” – a snapshot of current state of whole code in the branch.
Since in my projects I don’t use tagging much, I would suggest reading about it more here. Another topic I would recommend reading about is the “ignore” feature of git – I am not going to write on it, since a good documentation is available here.
A simple git workflow example
Now that all the basic features, here is a very simple workflow example that I am using so far, step-by-step:
- I have two environments – development and production.
- Development environment has a git repository with two branches – “dev” and “master”. “dev” is the branch for daily development use – to check-in also the half-baked code. “master” branch is used for merging “dev” branch into “master” for pushing the development changes to the production environment.
- Production environment has a git repository just with the “master” branch which contains the latest production code.
- After every development cycle in development environment I look at which files have been changed using “git status” and add all the changed files for commit using “git add <file>”.
- When that is done, commit the code to the “dev” branch using “git commit -m “<commit message>”"
- After commit, just for sake of backup, I push the local repository’s “dev” branch changes to a remote github host by using “git push”
- When a code in “dev” branch is tested and found ready for pushing to production environment, I am merging the “dev” branch into the “master” branch in development environment using following commands: first “git checkout master”, then “git merge dev” and push the changes of master branch to the remote git server usign “git push”
- Then deploy the altered code by pulling the changes to the production environment by switching to the production environment and using “git pull” – this is pulling the code from the remote git server’s “master” branch I just pushed the code to
Hope this was worth reading till the end. However, if you may have some additions to what I have written here – please let me know!