Inside Git: How It Works and the Role of the .git Folder

We use Git every day to save changes and collaborate, but have you ever wondered what's happening under the hood? Understanding Git's internal mechanisms, especially the mysterious .git folder, is key to truly mastering version control.

How Git Works Internally: The Snapshot Model

Unlike older version control systems that focused on storing differences between files, Git fundamentally thinks in terms of snapshots. Every time you commit, Git takes a picture of your entire project at that exact moment. If a file hasn't changed, Git doesn't store a new copy of it; instead, it creates a link to the previous identical file it has already stored. This makes Git incredibly efficient.

This "snapshot" approach is crucial for Git's speed and integrity. It allows Git to quickly recreate any version of your project at any point in its history.

Understanding the `.git` Folder

The .git folder is the heart of your local Git repository. When you run git init in a directory, Git creates this hidden folder. It's not just for configuration; it's where Git stores all the information about your project's history, including:

Objects: Your actual data (files, directories, commits).
References (Refs): Pointers to your commits, like branches and tags.
Logs: Records of what you've done.
Configuration: Settings for your repository.

Never manually delete or modify files inside the .git folder unless you know exactly what you are doing. It's Git's private database.

Here's a simplified view of its key components:

.git/
├── HEAD               # Points to the current branch/commit
├── config             # Repository specific settings
├── description        # Used by GitWeb (older, less common)
├── hooks/             # Custom scripts for Git events
├── info/
│   └── exclude        # Global ignore patterns
├── objects/           # The core database of Git objects (blobs, trees, commits)
│   ├── info/
│   └── pack/
└── refs/              # Pointers to commits (branches, tags)
    ├── heads/         # Local branches
    └── tags/          # Tags

Git Objects: Blob, Tree, Commit

At its core, Git stores everything as "objects." There are three primary types of objects you need to understand:

Blob (Binary Large Object): This is simply the content of a file. Git stores the exact content of a file as a blob. It doesn't care about the filename at this stage, just the data. If two files have identical content, Git only stores one blob and links to it twice.
Tree: A tree object represents a directory. It contains a list of filenames and pointers to other tree objects (subdirectories) or blob objects (files). A tree effectively recreates your project's directory structure.
Commit: A commit object is the snapshot of your project at a specific point in time. It contains:
- A pointer to the top-level tree object for that commit (representing the project's state).
- Pointers to its parent commit(s) (linking it to history).
- Author and committer information.
- The commit message.

graph TD
    subgraph Git Object Relationships
        C(Commit Object) --has parent--> P(Parent Commit)
        C --points to--> T(Tree Object - Root Directory)
        T --contains--> T1(Tree Object - Subdirectory A)
        T --contains--> B1(Blob Object - file1.txt content)
        T1 --contains--> B2(Blob Object - file2.txt content)
    end

This diagram illustrates how a commit points to a tree, which then organizes other trees and blobs to represent your project's file structure and content.

How Git Tracks Changes (The `git add` and `git commit` Flow)

Let's trace what happens when you use the two most fundamental Git commands:

1. `git add <file>` (Moving to the Staging Area/Index)

When you run git add <file>, Git does not immediately save a permanent version of your file. Instead, it:

Calculates a hash: Git reads the content of the file, calculates a SHA-1 hash for it, and stores this content as a new blob object in the .git/objects directory if it's new or changed.
Updates the Index: It then updates a special file called the "index" (sometimes called the staging area or cache). The index is like a blueprint for your next commit. It records the path of your file and the hash of the blob object that represents its content.

The git add command essentially prepares your changes for the next snapshot.

2. `git commit -m "Your message"` (Creating a Snapshot)

When you run git commit, Git performs several actions to finalize the snapshot:

Creates Tree Objects: Git takes the current state of your index (the blueprint) and uses it to construct a hierarchy of tree objects. It builds trees representing all your directories and subdirectories, with each tree pointing to the appropriate blob objects (for files) and other tree objects (for subdirectories).
Creates a Commit Object: Git then creates a new commit object. This commit object contains:
- A pointer to the top-level tree object created in step 1.
- A pointer to its parent commit (which is usually the HEAD of your current branch).
- Your commit message, author, and timestamp.
Updates Branch Pointer: Finally, Git moves the pointer of your current branch (e.g., main or master) to point to this newly created commit object. HEAD also updates to point to this new commit via the branch.

graph TD
    subgraph "Internal Flow: git add and git commit"
        WD("Working Directory <br> (Edited files)") -- "1. git add filename" --> SA("Staging Area (Index) <br> (Snapshot blueprint)")
        SA -- "2. git commit" --> LR("Local Repository <br> (.git/objects)")

        subgraph "Inside the .git/objects folder during commit"
            LR --> B("Blob Objects <br> (File Content)")
            LR --> T("Tree Objects <br> (Directory Structure)")
            LR --> C("Commit Objects <br> (Snapshots)")
        end

        SA -- "creates tree objects from index" --> T
        T -- "points to blobs for files" --> B
        C -- "points to top-level tree" --> T
        C -- "points to parent commit" --> OldC(Previous Commit)
        NewC(New Commit) -- "HEAD and Branch now point here"
        OldC --> C
    end

This diagram shows how git add moves content into the staging area, and git commit then uses that staged content to create linked tree, blob, and commit objects within the .git/objects folder.

How Git Uses Hashes to Ensure Integrity

Every object Git stores (blob, tree, commit) is identified by a unique SHA-1 hash (a 40-character hexadecimal string). This hashing mechanism is fundamental to Git's power and integrity:

Uniqueness: Every piece of content gets a unique ID.
Immutability: Once an object is created and stored, its hash identifies its content. If even a single bit of that content changes, its hash would be entirely different. This ensures that the history cannot be tampered with.
Efficiency: Git can quickly check if it already has an object by comparing hashes.

When you see commit hashes in git log, you are seeing these unique identifiers that link together your entire project's history.

Relationship between commits, trees, and blobs:

Understanding the .git folder and how Git objects (blobs, trees, commits) interact provides a robust mental model for how version control actually works. It demystifies commands like git add and git commit, helping you use Git more effectively and troubleshoot problems with confidence. It transforms Git from a set of commands to memorize into a powerful, logical system for managing your code.

Command Palette

Understanding the .git Folder

Git Objects: Blob, Tree, Commit

How Git Tracks Changes (The git add and git commit Flow)

1. git add <file> (Moving to the Staging Area/Index)

2. git commit -m "Your message" (Creating a Snapshot)