Shell's TechBlabber

…ShelLuser blogs about stuff ;)

Git: A developer tool not just for developers, you might like it too!

Git

When Linus Torvalds launched Git I’ve read many things about it, but being a rather die-hard FreeBSD sysadmin I didn’t really see much use for it myself at first. After all: FreeBSD uses Subversion which I really enjoyed using myself as well. And if it isn’t broke, why change it?  Well…

Git was one of those things which sat patiently on my todo list until I had the time to check it out. And then it happened: I was trying to help out a Minecraft related project which hosted their code on Github. And that uses Git. And so I figured it was about time to scratch Git from my todo list and start a crash course to first and foremost get my suggestions sent to the project and we’d see what might be next.

That was a few weeks ago.

In the mean time I’ve converted all my Subversion repositories to Git, discovered ways to use Git which are totally impossible on Subversion and I even managed to scratch yet another idea off my todo list because of it. Simply put: I think Git is amazing. And not just for developers, sysadmins like myself can definitely benefit from this as well.

So I figured I’d share my findings.

So what is Git?

Let’s start with the basics… (if this is all familiar stuff just skip some headers, I’m also addressing more more advanced Git uses later).

Git is a so called Version Control System, VCS in short, and its primary use is to allow you to keep a full history of the project you’re working on. Every time you make changes to the project you can commit those changes to the VCS repository, provide comments on what you changed after which the changes get stored and will available for access at any given time. And because all that data gets stored in a very sophisticated way you can easily manage years worth of development without that costing you too much storage space.

So why would you want to use something like this?

Example of a VCS workflow

The main reason is control; using a VCS system you remain in full control over your development during every step in the process. Assuming that you log every change you made then you can look up what you did in the past at any given moment. But not just looking: you can also easily undo changes, check what exactly got changed during one or more specific step(s) and then apply that elsewhere, and so forth.

And even more: you can also maintain several development cycles running in parallel so that one can be kept fully separate from the other. This maybe a bit dificult to grasp so…

An example: let’s say we’re working on a project and we get an idea for a new feature. The only problem is that we’re not 100% sure if this idea will actually work. On top of that we’re also busy with working out the projects documentation. And if we’d start working on 2 different things at once the whole thing could become quite a mess, right? Maybe it’s better to do one step at a time?

Well, this is where a VCS system such as Git can truly excel.

We could start by creating a so called branch. This is basically a separate line of development within your project. So every change you make here will be recorded in your VCS under this specific section: the new branch. But as soon as you switch back to the main branch, normally called ‘master’, then you’ll be taken right back to the situation before the branch was created. So any change you made will be reverted as if nothing had happened. If we were to try and visualize the way branching works it could look something like this:

+— F
/
O —- M

So O is where we started, F is the new branch where we’d try and work out the idea for the new feature and M is the main (‘master’) branch. We can switch back and forth between branches and all the work we do in both will be kept completely separated from each other.

If the new feature turns out to be undesirable then it’s easy: just remove the branch and continue on the master branch as if nothing happened. And if things do work out for the best: great! We can then simply merge the two branches together so that all the changes in the F (‘feature’) branch get applied to the Master branch after which we can then continue working on the project with the new feature included.

What makes this workflow so great is that you don’t have to worry about keeping backups, separating your work somehow and at all time can you continue working on your project. Although you do maintain 2 separate branches they’re both still part of the same project.

A real Git usage example

peter@unicron:/home/peter/git $ ls -l
total 3
-rw-r–r–  1 peter  peter  10 Apr 30 05:08 a.txt
-rw-r–r–  1 peter  peter  10 Apr 30 05:09 b.txt
-rw-r–r–  1 peter  peter  10 Apr 30 05:09 c.txt

Let’s say our project consists of 3 files, as seen above. All files contain special ‘code’ (a line of text stating “this is x”, where x ranges from a to c) and because our project is getting a bit too ‘complex’ we’re going to use a VCS to help us manage it.

The first advantage which Git has over more traditional systems (such as CVS or Subversion) is that it uses a so called de-centralized workflow. In other words: instead of using one repository it can use many.

Regular VCS systems work with one single (“central”) repository which is then used by all project members. So you can’t really tell Subversion “Ok, lets start a repository here” because it would need some specific preparations in order to store and later retrieve data. This is also one of the reasons why working with Subversion as a client is a whole lot easier than using it to administrate a repository.

Git does all this “different”. In fact: working with Git as a client or to maintain a central repository isn’t really all that much different. But… we’re getting ahead of the story again.

A new repository

So the first thing we’re going to do here is create our own (local) repository and then add the 3 files to that repository so that Git will start tracking them:

peter@unicron:/home/peter/git $ git init
Initialized empty Git repository in /home/peter/git/.git/
peter@unicron:/home/peter/git $ git add .
peter@unicron:/home/peter/git $ git commit -m “First commit”
[master (root-commit) 761b7df] First commit
3 files changed, 3 insertions(+)
create mode 100644 a.txt
create mode 100644 b.txt
create mode 100644 c.txt

So first I told Git to run ‘init’ the repository. I then added all files in the current directory (the dot) after which I commited all the additions with one single comment: “First commit”. Note: the reason I used the commandline parameter –m is because this makes for a better example. If you don’t specify this then Git will open the default editor after which you can write your comments there.

From this point on our project is fully under Git version control. So now I’m going to add d.txt which will symbolize the documentation, and I’ll make a small change to c.txt:

peter@unicron:/home/peter/git $ echo “more testing” > d.txt
peter@unicron:/home/peter/git $ echo “\nWe added documentation” >> c.txt
peter@unicron:/home/peter/git $ git status
On branch master
Changes not staged for commit:
(use “git add <file>…” to update what will be committed)
(use “git checkout — <file>…” to discard changes in working directory)

modified:   c.txt

Untracked files:
(use “git add <file>…” to include in what will be committed)

d.txt

no changes added to commit (use “git add” and/or “git commit -a”)

So here I created d.txt which contains the line “more testing” and I added the line “We added documentation” to c.txt. Then I asked Git to give me a status update and I think the result speaks for itself. Git has noticed that we modified c.txt and also spotted the new file d.txt. By default Git will keep track of an entire directory structure and will notice and mention any files which it doesn’t recognize, as seen above. This behavior is fully customizable though.

Now, this may look pretty common but it’s actually a somewhat special situation, especially in comparison to the more traditional way of working. Because Git gives you full control over your repository additions, even on a per-file basis if need be. If I were to commit the above situation then all Git would do is add the changes to c.txt to the repository. Then I could add d.txt later and commit that as well.

In this case both changes are related to the same thing: adding documentation. So I’ll commit both at once:

peter@unicron:/home/peter/git $ git add .
peter@unicron:/home/peter/git $ git status -s
M  c.txt
A  d.txt
peter@unicron:/home/peter/git $ git commit -am “Added documentation”
[master db85864] Added documentation
2 files changed, 3 insertions(+)
create mode 100644 d.txt

Creating a new branch

So now we have our project which contains the start for our documentation. Now for the new feature idea. I’m going to create a branch called “dev” and apply some massive changes to the project:

peter@unicron:/home/peter/git $ git branch dev
peter@unicron:/home/peter/git $ git checkout dev
Switched to branch ‘dev’
peter@unicron:/home/peter/git $ mv d.txt df.txt
peter@unicron:/home/peter/git $ echo “enhancement class” > dc.txt
peter@unicron:/home/peter/git $ rm b.txt
peter@unicron:/home/peter/git $ echo “Bribery” > ba.txt
peter@unicron:/home/peter/git $ cp ba.txt bb.txt
peter@unicron:/home/peter/git $ echo “this is a test class” >> bb.txt
peter@unicron:/home/peter/git $ git status -s
D b.txt
D d.txt
?? ba.txt
?? bb.txt
?? dc.txt
?? df.txt

So lets see what has happened here… First I made a mistake by not using Git to rename d.txt, but that is not a problem and you’ll soon learn why. As you can see by the status overview I’ve mostly deleted and added files. 2 files deleted and the ?? prefix shows you that those are files which Git isn’t aware off yet. So lets add those to the repository and commit our changes:

peter@unicron:/home/peter/git $ git add . && git commit -m “Feature beta”
[dev ed74ede] Feature beta
5 files changed, 4 insertions(+), 1 deletion(-)
delete mode 100644 b.txt
create mode 100644 ba.txt
create mode 100644 bb.txt
create mode 100644 dc.txt
rename d.txt => df.txt (100%)
peter@unicron:/home/peter/git $ git tag beta
peter@unicron:/home/peter/git $ ls
a.txt   ba.txt  bb.txt  c.txt   dc.txt  df.txt

So here I commited all my changes with the description “Feature beta” and I also tagged them. I’ll expain more about tagging in a few moments. And you’ll also notice that Git easily catched on with the rename. If we now look at the project log we’re going to see something really interesting:

peter@unicron:/home/peter/git $ git log
commit ed74ede49ae1a2f84e5ecbdc12c1058ff1419ad6 (HEAD -> dev, tag: beta)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:17:54 2018 +0200

Feature beta

commit db8586419f3fa02e2834fc4e3919d56a4d5e3b22 (master)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:03:30 2018 +0200

Added documentation

commit 761b7dfa3aff64283c7706691383afff1b080be1
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 05:24:35 2018 +0200

First commit

The large numbers behind ‘commit’ are the so called commit id’s. These are used to identify every action. The last commit is called HEAD, this is the name for the current status of whatever branch you’re working on and as you can see HEAD is pointing to dev, which is the name of the branch we’re currently working with. You can also see the tag ‘beta’. Then below that we can see the previous commit we’ve made: added documentation. And we can also see the name “master” behind the commit id which is the name of the main (default) branch.

This tells us that the master branch is behind the developments of our current dev branch. It has no knowledge of what we just did. So if we switch back to the master branch something very useful is going to happen:

peter@unicron:/home/peter/git $ ls
a.txt   ba.txt  bb.txt  c.txt   dc.txt  df.txt
peter@unicron:/home/peter/git $ git checkout master
Switched to branch ‘master’
peter@unicron:/home/peter/git $ ls
a.txt   b.txt   c.txt   d.txt
peter@unicron:/home/peter/git $ git log
commit db8586419f3fa02e2834fc4e3919d56a4d5e3b22 (HEAD -> master)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:03:30 2018 +0200

Added documentation

commit 761b7dfa3aff64283c7706691383afff1b080be1
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 05:24:35 2018 +0200

First commit

See what I mean? b.txt is back, ba.txt, bb.txt, dc.txt and df.txt are gone and we’re back at the situation in which we were right after we added the first changes to accomodate for our documentation. So basically I’m now working on 2 different situations within the same project. All I have to do is select (“checkout”) the appropriate branch and Git will do the rest for me.

This is a way of working which is simply impossible with Subversion. Sure, Subversion also supports branches and you can maintain different versions of your project next to each other. But not like this. Most of all: those branches would always be shared with the entire team. Remember: Subversion can only use 1 central repository. So after you created such a branch you’d better make sure that the rest of your team knows what you’re up to before they accidentally treat this as the new default development tree. With Git you don’t have to worry about any of that because all of this is happening in your own private repository.

And the possibilities are endless.

Tracking changes in branches

How about we set up another branch which we’ll dedicate for testing purposes? This would allow us to keep a bit of an overview between development and testing changes. It basically makes it easier on us. Because I want to ‘connect’ the two branches (meaning: I want my new branch to keep track of the developments in dev) I’m going to tell it to set up tracking:

peter@unicron:/home/peter/git $ git branch –track stage dev
Branch ‘stage’ set up to track local branch ‘dev’.
peter@unicron:/home/peter/git $ git checkout stage
Switched to branch ‘stage’
Your branch is up to date with ‘dev’.

So basicallly we more or less created the same situation as before: a new branch which is basically separated from the rest. With the main differnce that it will keep track of the changes in dev. Let’s try this.. First I’m selecting the dev branch and then:

peter@unicron:/home/peter/git $ echo “new feature” >> c.txt
peter@unicron:/home/peter/git $ git commit -am “test feature”
[dev 31d5771] test feature
1 file changed, 1 insertion(+)
peter@unicron:/home/peter/git $ git checkout stage
Switched to branch ‘stage’
Your branch is behind ‘dev’ by 1 commit, and can be fast-forwarded.
(use “git pull” to update your local branch)

See what happened here? I made a small change in dev and then switched back to stage, after which Git immediately detected the change and warned me about it. Lets apply the update to stage and then verify what actually got changed:

peter@unicron:/home/peter/git $ git pull
From .
* branch            dev        -> FETCH_HEAD
Updating ed74ede..31d5771
Fast-forward
c.txt | 1 +
1 file changed, 1 insertion(+)
peter@unicron:/home/peter/git $ git diff HEAD^
diff –git a/c.txt b/c.txt
index 5a05078..f2bcfd9 100644
— a/c.txt
+++ b/c.txt
@@ -1,3 +1,4 @@
this is c

We added documentation
+new feature

Pretty cool right? I added the text line “new feature” to c.txt and as you can see it spotted that perfectly. In case you’re wondering: I basically asked Git to show me the differences between the current situation and that of the previous commit. HEAD^ basically means “one commit before HEAD”, and as you might remember HEAD is a name which always points to the last applied change in a Git branch.

Also noteworthy is that the output you see above isn’t just some pretty text. This can actually be immediately used with a utility called “patch”. See also this link.

Merging branches

I added a few small changes to stage to serve as an example for my bug fixes and now I’ll switch to the master and merge stage which the master branch:

peter@unicron:/home/peter/git $ git status
On branch stage
Your branch is ahead of ‘dev’ by 1 commit.
(use “git push” to publish your local commits)

nothing to commit, working tree clean
peter@unicron:/home/peter/git $ git checkout master
Switched to branch ‘master’
peter@unicron:/home/peter/git $ git merge stage
Updating db85864..b84523a
Fast-forward
a.txt           | 2 ++
b.txt           | 1 –
ba.txt          | 1 +
bb.txt          | 2 ++
c.txt           | 1 +
dc.txt          | 4 ++++
d.txt => df.txt | 2 ++
7 files changed, 12 insertions(+), 1 deletion(-)
delete mode 100644 b.txt
create mode 100644 ba.txt
create mode 100644 bb.txt
create mode 100644 dc.txt
rename d.txt => df.txt (54%)

And done. Now the master and stage branches are fully equal. I’ll try to visualize what happened here:

/s—-o–BF
/      /      \
/d—F—-TF       |
/                       |
M—D——————MR

A horrible diagram, I know Winking smile  First we created M which is the main (master) branch. I added some documentation and then created the d (‘dev’) branch in which I added a new feature (‘F’) and then created the stage branch to test this. In the mean time I also created a test feature (‘TF’) which I then pulled into the stage branch as well. Then I created the bug fix (‘BF’). And finally, directly seen above, I merged the stage branch with the master branch.

So now the master branch contains a full history of all the applied changes:

commit b84523a6907a4280a2391dd5c0e0d934f9ed5771 (HEAD -> master, stage)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 20:40:09 2018 +0200

Bug fixes

commit 31d57712c244c314fd0b63fddc333a35a6a5a632 (dev)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:41:12 2018 +0200

test feature

commit ed74ede49ae1a2f84e5ecbdc12c1058ff1419ad6 (tag: beta)
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:17:54 2018 +0200

Feature beta

commit db8586419f3fa02e2834fc4e3919d56a4d5e3b22
Author: ShelLuser <pl@intranet.lan>
Date:   Mon Apr 30 19:03:30 2018 +0200

Added documentation

As you can see HEAD is now pointing at both master and stage, because both have just been merged. So it’s the same situation. The dev branch is behind all that because it didn’t get hold of the last test features. Before that you see the feature beta entry which we also tagged and then we’re back to the start situation of the whole thing which was adding documentation.

One project, multiple development cycles and at all time can you revert back to a specific situation. You’re in full control.

A grab at history

Let’s do something silly. Remember b.txt? We basically replaced that in exchange for ba and bb. But what if we want to take a closer look at b.txt again? Well, we know that ‘added documentation’ was the point before we changed b.txt and somewhat split it up. So lets take a closer look at that situation… For this to work we’ll need to use the commit id:

peter@unicron:/home/peter/git $ git checkout db8586419f3fa02e2834fc4e3919d56a4d5e3b22
Note: checking out ‘db8586419f3fa02e2834fc4e3919d56a4d5e3b22’.

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b <new-branch-name>

HEAD is now at db85864 Added documentation
peter@unicron:/home/peter/git $ ls
a.txt   b.txt   c.txt   d.txt

And see? We got b.txt back. So what if we would like to copy b.txt back into our main setup? As you can read from the description we can make experimental changes and optionally commit them. But the easiest way is to simply checkout one single file (b.txt in this example) from this specific commit. So:

peter@unicron:/home/peter/git $ git checkout db8586419f3fa02e2834fc4e3919d56a4d5e3b22 – b.txt
peter@unicron:/home/peter/git $ ls
a.txt   b.txt   ba.txt  bb.txt  c.txt   dc.txt  df.txt
peter@unicron:/home/peter/git $ git status -s
A  b.txt

As you can see b.txt is back and Git has noticed that this is a new file which got added to the work directory. If we only wanted to check its contents, edit it or perhaps copy some parts then now would be a good time to do so. In my example I want to reset the situation as it was before. So:

peter@unicron:/home/peter/git $ git reset HEAD
peter@unicron:/home/peter/git $ git status
On branch master
nothing to commit, working tree clean

And we’re back at the end result.

The amount of stuff you can do with Git is honestly mind boggling.

Next post…

In my next post I’m going to show you how Git can also be useful for (Unix) system administrators such as myself. I’m going to focus on Unix because that is a commandline OS which I happen to use but the same could apply to Windows.

Have you ever had a situation where you made many changes to a system or service config file and would really like to see what it looked liked 2 years ago? Most companies and individuals don’t keep a backup retention for that long, and even if you did then it would probably be quite a hassle to get 1 single file back from such an archive.

This is where Git can also shine.

Just dump all your coniguration files into a structured Git repository and you’ll have access to older revisions long after the facts. Because with Git you’d basically be maintaining a backup in itself. No need to use the system backup for that anymore.

Advertisements

April 30, 2018 - Posted by | Editorial, Tips and tricks | , , ,

Sorry, the comment form is closed at this time.

%d bloggers like this: