Skip to content
shubham

Peeking inside .git

5 min read

To users, git is a VCS, but for designers, it’s a content-addressable system. The core of git is essentially a hash table. Any kind of data can be inserted in git and it returns a unique key, which is used to fetch that object.

Git is designed like a bathroom. It has porcelain commands and plumbing commands. You interact with the porcelain but underneath there’s plumbing. Plumbing is for computers and porcelain is for humans. Porcelain is built upon plumbing. There's a section towards the end called "bottoms up git". We will learn about how to commit using porcelain commands.

Git commands are a leaky abstraction over the data storage. You tell Git that you want to save a snapshot of your project and it basically records a manifest of what all of the files in your project look like at that point. Git is more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.

We will explore 5 famous git commands.

  1. git init
  2. git add
  3. git commit
  4. git branch
  5. git merge

Apart from that, we will also explore the HEAD pointers.

git init

This is what an empty git repository looks like, after running git init. We will only focus on HEAD, objects/, refs/, heads/ and tags/.

1$ git init
2$ tree .git
3
4.git
5├── HEAD
6├── config
7├── description
8├── hooks
9│   ├── applypatch-msg.sample
10│   ├── commit-msg.sample
11│   ├── fsmonitor-watchman.sample
12│   ├── post-update.sample
13│   ├── pre-applypatch.sample
14│   ├── pre-commit.sample
15│   ├── pre-merge-commit.sample
16│   ├── pre-push.sample
17│   ├── pre-rebase.sample
18│   ├── pre-receive.sample
19│   ├── prepare-commit-msg.sample
20│   ├── push-to-checkout.sample
21│   └── update.sample
22├── info
23│   └── exclude
24├── objects
25│   ├── info
26│   └── pack
27└── refs
28├── heads
29└── tags

add & commit

Create a new file and put some content in it.

1$ echo "console.log("Hello World");" > new.js && cat new.js
2console.log("Hello World");

git add

After adding a new object is created ac/cefceba62b4874a613a2336de33ee716e99931

1$ git add new.js
2.git
3├── HEAD
4├── objects
5│   ├── ac
6│   │   └── cefceba62b4874a613a2336de33ee716e99931
7│   ├── info
8│   └── pack
9└── refs
10 ├── heads
11 └── tags

It is a unique SHA hash and we can address the hash using the first 4 characters, acce. However, the directory structure is slightly odd. Why is there a subdirectory? Git objects can grow up 10k+ in number and File Systems don’t really like it when you have a really high number of files in one directory. So to keep the files manageable git creates a directory first.

blob object

Git ships with a really convenient plumbing command cat-file to print all the contents in the file. We can look at the content of the hash acce This command needs just the first 4 characters of the hash. That’s unique for all the hash created. The hash file contains the content of main.js. The type of the hash is a blob. blob is one of the git object types. Blob means “binary large object”. When we git add git creates a blob object for that file. blob is the git object type for storing files.

1$ git cat-file -p acce
2console.log("Hello World");
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6$ git cat-file -t acce
7blob

commit

1# commit the changes after adding.
2$ git commit -m "first commit"
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6# look at the contents of the git directory.
7$ tree .git
8.git
9├── COMMIT_EDITMSG
10├── HEAD
11├── logs
12│   ├── HEAD
13│   └── refs
14│   └── heads
15│   └── master
16├── objects
17│   ├── 26
18│   │   └── c7fccd29746f6775d8f291c6e0bbdfba6a4aac
19│   ├── 8e
20│   │   └── 62e9859f9e0283f159a0a94a6ea7a7372e9b56
21│   ├── ac
22│   │   └── cefceba62b4874a613a2336de33ee716e99931
23│   ├── info
24│   └── pack
25└── refs
26 ├── heads
27 │   └── master
28 └── tags
29
3014 directories, 25 files

After the commit, we have two new hashes, 26c7 and 8e62. One is the object tree and the other is the commit hash object. First, the tree is created and then the commit object gets created. While a tree represents a particular directory state of a working directory, a commit represents that state in "time" and explains how to get there. The commit object contains the directory tree object hash, parent commit hash, author, committer, date and message. We’ll come back to the other object later. Let’s explore the commit object 26c7.

The commit hash contains the names and references to an object called tree. The hash of the tree 8e62 is the other hash file that was created. We can check the type of a git object using git cat-file -t. It returns the object type as commit.

1$ git cat-file -p 26c7
2tree 8e62e9859f9e0283f159a0a94a6ea7a7372e9b56
3author Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656713957 +0200
4committer Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656713957 +0200
5
6first commit
7
8 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
9
10$ git cat-file -t 26c7
11commit

tree

The content of commit object has a reference tree 8e62e9859f9e0283f159a0a94a6ea7a7372e9b56. Trees are pointers to file names, content, and other trees. The tree is employed for storing filename and is also used to store a group of files together. Git stores content similar to the UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. A single tree object contains one or more entries, each of which is the SHA-1 hash of a blob or subtree with its associated mode, type, and filename.

This is what a tree file looks like.

1$ git cat-file 8e62
2100644 blob accefceba62b4874a613a2336de33ee716e99931 main.js
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6$ git cat-file -t 8e62
7tree

The tree object contains one line per file or subdirectory, with each line giving file permissions(10644), object type(blob), object hash(acce) and filename (main.js)Object type is either “blob” for a file or “tree” for a subdirectory.

master branch pointer

We have a file refs/heads/master, a HEAD pointer for the master and it points to the latest commit. You can create different branches and they create different pointers.

1# content of master pointer
2$ cat .git/refs/heads/master
326c7fccd29746f6775d8f291c6e0bbdfba6a4aac
4
5**** #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
6
7# HEAD and master
8$ git log
9commit 26c7fccd29746f6775d8f291c6e0bbdfba6a4aac (HEAD -> master)
10Author: Shubham Srivastava <shbm@Shubhams-MacBook-Air.local>
11Date: Sat Jul 2 00:19:17 2022 +0200
12
13 first commit

branch & merge

git branch -b feature creates a new branch. It also creates a new HEAD for the feature branch. At this point of branching, the feature branch shares the same location as the master. We can verify it by looking at refs/logs/heads/feature.

1$ git branch -b feature
2.git
3├── COMMIT_EDITMSG
4├── HEAD
5├── logs
6│   ├── HEAD
7│   └── refs
8│   └── heads
9│   ├── feature
10│   └── master
11├── objects
12│   ├── 26
13│   │   └── c7fccd29746f6775d8f291c6e0bbdfba6a4aac
14│   ├── 8e
15│   │   └── 62e9859f9e0283f159a0a94a6ea7a7372e9b56
16│   ├── ac
17│   │   └── cefceba62b4874a613a2336de33ee716e99931
18│   ├── info
19│   └── pack
20└── refs
21 ├── heads
22 │   ├── feature
23 │   └── master
24 └── tags
25
2614 directories, 25 files
27
28 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
29
30# Which pointer does the feature HEAD point? Same as master's HEAD.
31$ cat .git/logs/refs/heads/feature
320000000000000000000000000000000000000000 26c7fccd29746f6775d8f291c6e0bbdfba6a4aac Shubham Srivastava [shbm@Shubhams-MacBook-Air.local](mailto:shbm@Shubhams-MacBook-Air.local) 1656722666 +0200 branch: Created from HEAD

Modify main.js to make changes in the feature branch. The file now contains.

1$ cat main.js
2console.log("Hello World");
3console.log("Feature");

Adding a new file creates a new object. As seen previously, it creates a new blob object.

1$ git add mains.js
2.git
3├── COMMIT_EDITMSG
4├── HEAD
5├── logs
6│   ├── HEAD
7│   └── refs
8│   └── heads
9│   ├── feature
10│   └── master
11├── objects
12│   ├── 07
13│   │   └── 99851535ee3b53930befa9a383691eaa29ed9d
14│   ├── 26
15│   │   └── c7fccd29746f6775d8f291c6e0bbdfba6a4aac
16│   ├── 8e
17│   │   └── 62e9859f9e0283f159a0a94a6ea7a7372e9b56
18│   ├── ac
19│   │   └── cefceba62b4874a613a2336de33ee716e99931
20│   ├── info
21│   └── pack
22└── refs
23 ├── heads
24 │   ├── feature
25 │   └── master
26 └── tags
27
2815 directories, 28 files

The new blob object contains the latest version of mains.js.

1# What does the newly created hash file contains?
2$ git cat-file -p 0799
3console.log("Hello World");
4console.log("Feature");

As seen previously, committing crates two objects; the tree object and the commit object. a150 is the tree file and d23a is the commit object. The HEAD of the feature branch has also changed which now contains the latest commit in the feature branch. And since we’re on a different branch the HEAD includes a ref to refs/heads/feature.

1$ git commit -m “feature”
2.git
3├── COMMIT_EDITMSG
4├── HEAD
5├── logs
6│   ├── HEAD
7│   └── refs
8│   └── heads
9│   ├── feature
10│   └── master
11├── objects
12│   ├── 07
13│   │   └── 99851535ee3b53930befa9a383691eaa29ed9d
14│   ├── 26
15│   │   └── c7fccd29746f6775d8f291c6e0bbdfba6a4aac
16│   ├── 8e
17│   │   └── 62e9859f9e0283f159a0a94a6ea7a7372e9b56
18│   ├── a1
19│   │   └── 50a1687ff7dd85b374a223d99259836fa8a0cd
20│   ├── ac
21│   │   └── cefceba62b4874a613a2336de33ee716e99931
22│   ├── d2
23│   │   └── 3a3ba983a7d4ab08cc47e9a5b8189139e6712a
24│   ├── info
25│   └── pack
26└── refs
27 ├── heads
28 │   ├── feature
29 │   └── master
30 └── tags
31
3217 directories, 30 files
33
34 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
35
36# a150 is the tree file. It contains ref of 0799, the main.js file with new edits.
37$ git cat-file -p a150
38100644 blob 0799851535ee3b53930befa9a383691eaa29ed9d main.js
39
40 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
41
42# d23a is the commit hash of the feature branch. It contains tree a150 as the content
43$ git cat-file -p d23a
44tree a150a1687ff7dd85b374a223d99259836fa8a0cd
45parent 26c7fccd29746f6775d8f291c6e0bbdfba6a4aac
46author Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656714340 +0200
47committer Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656714340 +0200
48
49feature commit
50
51 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
52
53# the HEAD pointer of feature brach
54$ cat .git/refs/heads/feature
55d23a3ba983a7d4ab08cc47e9a5b8189139e6712a
56
57# the current head
58$ cat .git/HEAD
59ref: refs/heads/feature

Checking out master creates changes to the HEAD. It moves the HEAD back to the master’s HEAD.

1# Let's change the branch and print HEAD again
2$ git checkout master
3Switched to branch 'master'
4
5 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
6
7# HEAD file is the latest head.
8# soft-reset HEAD~1
9$ cat .git/HEAD
10ref: refs/heads/master

We’ve learned about all the git objects and a basic idea about what happens when we execute some of the git commands. Now we will create a new commit with only the porcelain commands.

bottoms up git

Let’s create a new directory structure which looks like .git

1$ tree .git
2.git
3├── HEAD
4├── config
5├── info
6│   └── exclude
7├── objects
8│   ├── info
9│   └── pack
10└── refs
11 ├── heads
12 └── tags

hash-object

This command computes the object ID value for an object with a specified type with the contents of the named file (which can be outside of the work tree). -w optionally writes the resulting object into the object database. When <type> is not specified, it defaults to "blob". So we created a new blob hash with the content “Hello World” in the database.****

1$ echo "Hello World" | git hash-object --stdin -w
2557db03de997c86a4a028e1ebd3a1ceb225be238
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6$ tree .git
7.git
8├── HEAD
9├── config
10├── info
11│   └── exclude
12├── objects
13│   ├── 55
14│   │   └── 7db03de997c86a4a028e1ebd3a1ceb225be238
15│   ├── info
16│   └── pack
17└── refs
18 ├── heads
19 └── tags

We can verify what the contents are using cat-file

1$ git cat-file -p 557d
2Hello World
3 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
4
5$ git cat-file -t 557d
6blob

update-index

Modifies the index. Each file mentioned in the command is updated in the index. To bring a file to the staging area we use update-index. But if we look at the status, it returns a strange status. It shows a new file as hello and also a deleted file called hello.

1$ git update-index --add --cacheinfo 10644 557db03de997c86a4a028e1ebd3a1ceb225be238 hello
2
3 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
4
5$ git status
6On branch master
7
8No commits yet
9
10Changes to be committed:
11 (use "git rm --cached <file>..." to unstage)
12 new file: hello
13
14Changes not staged for commit:
15 (use "git add/rm <file>..." to update what will be committed)
16 (use "git restore <file>..." to discard changes in working directory)
17 deleted: hello

write-tree

We can create a new tree object using write-tree. It creates a tree object using the current index. The name of the new tree object is printed to standard output. Conceptually, git write-tree sync’s the current index contents into a set of tree files. We can see the object in the .git/objects directory. We can verify the contents of the hash using cat-file.

1# creates a new tree object
2$ git write-tree
3117c62a8c5e01758bd284126a6af69deab9dbbe2
4
5 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
6
7$ tree .git
8.git
9├── HEAD
10├── config
11├── index
12├── info
13│   └── exclude
14├── objects
15│   ├── 11
16│   │   └── 7c62a8c5e01758bd284126a6af69deab9dbbe2
17│   ├── 55
18│   │   └── 7db03de997c86a4a028e1ebd3a1ceb225be238
19│   ├── info
20│   └── pack
21└── refs
22 ├── heads
23 └── tags
24
25$ git cat-file -p 117c
26100644 blob 557db03de997c86a4a028e1ebd3a1ceb225be238 hello

commit-tree

However, the status does not change because we haven’t added that tree object to a commit object. To create a new commit, git uses commit-tree which creates a new commit and takes in the hash of the tree object. commit-tree creates a new commit object based on the provided tree object and emits the new commit object id on stdout.

1$ git commit-tree 117c -m "First Commit"
263dc01736bdd6b7e5d15e3b871590573550704fd
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6$ tree .git
7.git
8├── HEAD
9├── config
10├── index
11├── info
12│   └── exclude
13├── objects
14│   ├── 11
15│   │   └── 7c62a8c5e01758bd284126a6af69deab9dbbe2
16│   ├── 55
17│   │   └── 7db03de997c86a4a028e1ebd3a1ceb225be238
18│   ├── 63
19│   │   └── dc01736bdd6b7e5d15e3b871590573550704fd
20│   ├── info
21│   └── pack
22└── refs
23 ├── heads
24 └── tags
25
2610 directories, 7 files
27
28$ git cat-file -p 63dc01736bdd6b7e5d15e3b871590573550704fd
29tree 117c62a8c5e01758bd284126a6af69deab9dbbe2
30author Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656808916 +0200
31committer Shubham Srivastava <shbm@Shubhams-MacBook-Air.local> 1656808916 +0200
32
33First Commit

However, the status is still not happy.

1$ git status
2On branch master
3
4No commits yet
5
6Changes to be committed:
7 (use "git rm --cached <file>..." to unstage)
8 new file: hello
9
10Changes not staged for commit:
11 (use "git add/rm <file>..." to update what will be committed)
12 (use "git restore <file>..." to discard changes in working directory)
13 deleted: hello

changing HEAD

There’s a file called HEAD which references refs/heads/master. Although there is no file in that location. It needs to be created echo 63dc01736bdd6b7e5d15e3b871590573550704fd > .git/refs/heads/master. HEAD references the latest commit in the working tree. Normally a commit would identify a new "HEAD" state, and while Git doesn’t care where you save the note about that state, in practice we tend to just write the result to the file that is pointed at by .git/HEAD, so that we can always see what the last committed state was.

1$ cat .git/HEAD
2ref: refs/heads/master
3
4 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
5
6# no file refs/heads/master
7$ tree .git
8.git
9├── HEAD
10├── config
11├── index
12├── info
13│   └── exclude
14├── objects
15│   ├── 11
16│   │   └── 7c62a8c5e01758bd284126a6af69deab9dbbe2
17│   ├── 55
18│   │   └── 7db03de997c86a4a028e1ebd3a1ceb225be238
19│   ├── 63
20│   │   └── dc01736bdd6b7e5d15e3b871590573550704fd
21│   ├── info
22│   └── pack
23└── refs
24 ├── heads
25 └── tags
26
2710 directories, 7 files
28
29# Git is looking for the latest pointer in HEAD.
30# The latest pointer should be the latest commit.
31$ echo 63dc01736bdd6b7e5d15e3b871590573550704fd > .git/refs/heads/master
32
33 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
34
35# the log contains the hash now
36$ git log
37commit 63dc01736bdd6b7e5d15e3b871590573550704fd (HEAD -> master)
38Author: Shubham Srivastava <shbm@Shubhams-MacBook-Air.local>
39Date: Sun Jul 3 02:41:56 2022 +0200
40
41 First Commit
42
43 #=-=-=-=-=-=-=-=-=-=-=-=-=-=-=#
44
45# Now if we look at the status we get something different.
46# Still not 100% happy.
47$ git status
48On branch master
49Changes not staged for commit:
50 (use "git add/rm <file>..." to update what will be committed)
51 (use "git restore <file>..." to discard changes in working directory)
52 deleted: hello
53
54no changes added to commit (use "git add" and/or "git commit -a")

git checkout

Why is a status not completely happy. The log contains the latest commit but a final piece is missing. Although we have added the latest commit in the working tree we haven’t moved to that pointer yet. We need to use checkout to bring it to the latest HEAD pointer.

TIP: -- is used by git to run an operation on a specific file.

1# -- in git applies a command to a specific file. hello was created when update-index was executed
2$ git checkout HEAD -- hello

Finally, Git is happy.

1# The status is now clean
2$ git status
3On branch master
4nothing to commit, working tree clean