About Johannes Brodwall

Johannes is the chief scientist of the software offshore company Exilesoft. He's got close to 15 years programming Java, C# and a long time ago other languages as well. He believes that programming is about more than just writing the code, but that too many people lose touch with the coding as well. He has been organizing software development activities in Oslo for many years. In addition, he often speaks at conferences all over Europe.

Having fun with Git

I recently read The Git Book. As I went through the Git Internals parts, it struck me how simple and elegant the structure of Git really is. I decided that I just had to create my own little library to work with Git repositories (as you do). I call the result Silly Jgit. In this article, I will be walking through the code.

This article is for you if you want to understand Git a bit deeper or perhaps even want to work directly with a Git repository in your favorite programming language. I will be walking through four topics: 1) Reading a raw commit from a repository, 2) Reading the tree hash of the root of a commit, 3) parsing the file list of a directory tree, and 4) Reading the file contents from a subdirectory of a commit root.

Reading the head commit from a repository

The first thing we need to do in order to read the head commit is to find out which commit is the head of the repository. The .git/HEAD file is a plain text file that contains the name of a file in the .git/refs/heads directory. If you’ve checked out master, this will be .git/refs/heads/master. This file is a plain text file which contains a hash, that is: a 40 digit hexadecimal number. The hash can be converted to a filename of a Git Object under .git/objects. This file is a compressed file containing the commit information. Here’s the code to read it:

File repository = new File(".git");
File headFile = new File(repository,
         Util.asString(new File(repository, "HEAD")).split(" ")[1].trim());

String commitHash =  Util.asString(headFile).trim();
File commitFile = new File(repository,
         "objects/" + commitHash.substring(0,2) + "/" + commitHash.substring(2));
try(final InputStream inputStream = new InflaterInputStream(new FileInputStream(commitFile))) {
    System.out.println(Util.asString(inputStream));
}

Running this code produces the following output (notice that some of the spaces in the output are actually null bytes in the file):

commit 237 tree c03265971361724e18e31cc83e5c60cd0e0f5754
parent 141f5d5a2cc0c268e7b05be17a49c1c0dc61efad
author Johannes Brodwall  1379445359 +0200
committer Johannes Brodwall  1379445359 +0200

This is the commit comment

Finding the directory tree of a commit

When we have the commit information, we can parse it to find the tree hash. The tree hash references another file under .git/objects which contains the index of the root directory of the files in the commit. In the example above, the tree hash is “c03265971361724e18e31cc83e5c60cd0e0f5754″. But before we read the tree hash, we have to read the object type (in this case a “commit”) and size (in this case 237).

String treeHash;
try(final InputStream inputStream = new InflaterInputStream(new FileInputStream(commitFile))) {
    String type = Util.stringUntil(inputStream, ' ');
    long length = Long.valueOf(Util.stringUntil(inputStream, (char)0));
    Util.stringUntil(inputStream, ' ');
    treeHash = Util.stringUntil(inputStream, '\n');
    System.out.println("Tree hash: " + treeHash);
}

File rootTreeFile = new File(repository,
       "objects/" + treeHash.substring(0,2) + "/" + treeHash.substring(2));
try(final InputStream inputStream = new InflaterInputStream(new FileInputStream(rootTreeFile))) {
    System.out.println(Util.asString(inputStream));
}

Looking at the tree hash file is not as straight forward, however:

tree 130 100644 FOO æ?â?²ÑÖCK?)®wZØÂä?S?
100644 FOO.txt ýc?Õô¹ìmìªGAk?X?ï'&
100644 README Wýs?ºyâx+@îR°X040000 lib ?ñG»Ñ?¼>&8´. ?úË¢i[o

The next part of this article will show how to deal with this.

Parsing a directory tree

The tree file has what looks like a lot of garbage. But don’t panic. Just like with the commit object, the tree object starts with the type (“tree”) and the size (130). After this, it will list each file or directory. Each tree entry consists of permissions (which also tells us whether this is a file or a directory), the file name and the hash of the entry, but this time as a binary number. We can read through the entries and find the file we want. We can then just print out the contents of this file:

File rootTreeFile = new File(repository,
        "objects/" + treeHash.substring(0,2) + "/" + treeHash.substring(2));
Map<string ,String> entries = new HashMap<>();
try(final InputStream inputStream = new InflaterInputStream(new FileInputStream(rootTreeFile))) {
    String type = Util.stringUntil(inputStream, ' ');
    long length = Long.valueOf(Util.stringUntil(inputStream, (char)0));

    while (true) {
        String octalMode = Util.leftPad(Util.stringUntil(inputStream, ' '), 6, '0');
        if (octalMode == null) break;

        String path = Util.stringUntil(inputStream, (char)0);
        StringBuilder hash = new StringBuilder();
        for (int i=0; i<20; i++) {
            hash.append(Util.leftPad(Integer.toHexString(inputStream.read()), 2, '0'));
        }
        entries.put(path, hash.toString());
    }
}

System.out.println(entries);
</string>

Here’s an example of a parsed directory listing. I have not showed the octalMode for each file, but this can be extremely useful to separate between directories (which octalMode starts with 0) and files:

{FOO.txt=fd6385d5f4b9ec6decaa47416b7f96588aef2726,
lib=8ff147bbd18fbc3e2638b42ea09cfacba2695b6f,
README=57fd19a7738eba1e79e2782b161a40ee52b05801,
FOO=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391}

Reading a file

This leads us to the end of our journey – how to read the contents of a file. Once we have the entries of a tree, it’s a simple matter of looking up the hash for a filename and parsing that file. As before, the file contents will start with the type (“blob” – which means “data”, I guess) and file size:

String blobHash = entries.get("README");
File blobFile = new File(repository, "objects/" + blobHash.substring(0,2) + "/" + blobHash.substring(2));
try(final InputStream inputStream = new InflaterInputStream(new FileInputStream(blobFile))) {
    String type = Util.stringUntil(inputStream, ' ');
    long length = Long.valueOf(Util.stringUntil(inputStream, (char)0));

    System.out.println(Util.asString(inputStream));
}

This prints the contents of our file. Obviously, if you want to find a file a subdirectory, you’ll have to do a bit more work: Parse another tree object and look and an entry in that object, etc.

Conclusions

This blog post shows how in less than 50 lines of code, with no dependencies (but a small utility helper class), we can find the head commit of a git repository, parse the file listing of the root of the file tree for that commit and print out the contents of a file. The most difficult part was to discover that it was the InflaterInputStream and not Zip or Gzip that was needed to unpack a git object.

My silly-jgit project supports reading and writing commits, trees and hashes from .git/objects. This is just the core subset of the Git plumbing commands. Furthermore, just as I wrote the article, I noticed that git often packs objects into .git/objects/pack. This adds a totally new dimension that I haven’t dealt with before.

I hope that nobody is crazy enough to actually use my silly Git library for Java. But I do hope that this article gave you some feeling of Git mastery.
 

Reference: Having fun with Git from our JCG partner Johannes Brodwall at the Thinking Inside a Bigger Box blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

2 Responses to "Having fun with Git"

  1. Grethel says:

    This is cool article too
    http://newartisans.com/2008/04/git-from-the-bottom-up/

    but i don’t remember if i read about it here or not :)

  2. Thanks, Grethel – this looks like an interesting read. Although to be fair, my article is even more from the bottom. I even make my own plumbing. :-)

Leave a Reply


− five = 4



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close