Understanding Git: Unveiling the Secrets of the .git Folder

Before understanding the internal working of Git, lets briefly understand What is git?

Git is a Version Control System(VCS) . It is used for tracking changes, collaborating with other developers and manage different versions of a project.

Understanding `.git folder`

When you initialize a Git repository with:

git init

Initialized empty Git repository in /folder/path

Git creates a hidden folder called .git. This folder is the brain of Git, and it stores everything Git needs to track your project.

Components inside .git folder

Objects
- Git stores your project data in objects: blobs, trees, and commits.
  - Blob: Stores the content of a file.
  - Tree: Represents a directory; points to blobs and other trees.
  - Commit: Represents a snapshot of your project at a point in time, pointing to a tree.
Refs
- These are pointers to commits.
  - refs/head/main/ → points to the latest commit on the main branch.
  - refs/tags/ → points to specific commits tagged by the user.
HEAD
- A special file that points to the current branch or commit you’re working on.
Index / Staging Area
- Git has a staging area (or "index") which is like a to-do list of changes that will go into your next commit.
Configuration & Logs
- .git/config → stores repository-specific settings
- .git/logs → keeps a log of all changes to references

How Git Stores Changes Internally

Git works differently from other version control systems. Instead of storing differences (diffs) between file versions, Git stores snapshots.

Git Workflow Example

Suppose we have a project with one file:

Here’s what happens internally:

git init
- .git folder created.
git add file.txt
- Git creates a blob object containing "Hello World".
- Updates the index to record this file as staged.
git commit -m "First commit"
- Git creates a tree object representing the project folder (pointing to the blob).
- Creates a commit object pointing to the tree.
- Updates refs/heads/main and HEAD to point to this commit.

Now, if you change file.txt:

Hello World! → Hello Git!

Git creates a new blob for this new content and a commit pointing to the tree representing the folder structure.

Even if most content is the same, Git doesn’t duplicate blobs unnecessarily. If a file didn’t change, Git just reuses the old blob.

What Happens During `git add`

Command:

git add file.txt

Step by Step Internally:

Git reads the contents of file.txt.
It creates a blob object (binary large object) in .git/objects containing the file content.
Git updates the index (staging area) with a reference to this blob.
Now Git knows: “ This file is ready to be included in the next commit.”

Example:

File file.txt:

Hello Git!

Git creates a blob object like:

The index now has an entry:

What Happens During `git commit`

Command:

git commit -m “Add file.txt“

Step by Step Internally:

Git looks at the staging area (index) for files you staged.
It creates a tree object representing the directory structure, pointing to all the blob objects in the index.
Git creates a commit object that points to this tree, stores the commit message, author info, and timestamp.
Git updates the branch reference (e.g., refs/heads/main) to point to this new commit.
Git updates HEAD to point to the branch (which now points to the latest commit).

Example:

Tree object:

Commit object:

How Git uses Hashes to ensure integrity

1. Git and SHA-1 Hashes

Git uses a cryptographic hash function called SHA-1 (Secure Hash Algorithm 1) to uniquely identify every piece of data it stores. This includes:

Blobs – the contents of files
Trees – directories and their contents
Commits – snapshots of the repository with metadata

A SHA-1 hash is a 40-character hexadecimal string, e.g.,

f572d3343fae323fb42r42r2r2rdskfs

Git calculates this hash from the contents of the object, not the filename or timestamp.

2. Why Hashes Ensure Integrity

Hashes have three important properties that Git relies on:

Uniqueness: Different content produces different hashes.
Deterministic: The same content always produces the same hash.
Tamper-proof: Even a single character change results in a completely different hash.

Example:

Git calculates a SHA-1 hash for the blob (file content).
It then includes that blob in a tree object for the directory.
The commit object references the tree and includes metadata.
Each commit gets its own SHA-1 hash.

If the content of file.txt changes:

The new blob hash will be completely different, even though the filename is the same.

3. How Git Detects Corruption

Because every object in Git is identified by its SHA-1 hash:

If a file or commit is corrupted in the .git directory, the hash no longer matches the content.
Git can immediately detect this when you run commands like:

git fsck

fsck (file system check) verifies that:

All objects exist.
All SHA-1 hashes are correct.
There are no broken links between commits, trees, and blobs.

This ensures end-to-end integrity of your repository.

4. Hashes Link Everything Together

Hashes do more than identify content—they also link objects in a chain:

Each commit references its parent commit’s hash.
Each commit references a tree hash.
Trees reference blobs and subtrees by their hashes.

This creates a cryptographic chain of trust:

commit_hash → tree_hash → blob_hash

If someone modifies a file in a commit, the blob hash changes → the tree hash changes → the commit hash changes.
Git can immediately tell something was tampered with.

Understanding Git: Unveiling the Secrets of the .git Folder

Understanding `.git folder`