Skip to content

A Brief Introduction to Git Data Structures

TLDR

  • All Git data and version control history are stored in the .git directory; deleting this directory is equivalent to deleting the local version control.
  • The objects folder stores three types of objects using SHA-1 hashes: Blobs (file content), Trees (directory structure), and Commits (commit information).
  • The refs folder stores pointers for branches and tags, which are essentially files pointing to specific Commit HASHes.
  • The logs folder records the history of changes to HEAD and branches, which can be queried via git reflog and used to restore accidentally deleted commits.
  • The index file is a binary file that records the snapshot of the staging area after git add.
  • The HEAD file records the branch or commit position of the current working directory.
  • Branches and tags are merely pointers to commits; Git's history is formed by a chain of pointers (pointing to the previous commit) within the commit objects.

Analysis of the .git Directory Structure

All data stored by Git resides in the .git folder. Below are the purposes and operating mechanisms of the core components within this directory.

Directory Structure

  • hooks: Stores various custom scripts that execute automatically at specific moments during Git operations (such as commit, push, merge), suitable for automated testing or checking code style.
  • info: Stores auxiliary information. The exclude file within is used to define local exclusion rules, serving the same purpose as .gitignore but applicable only to a single developer environment.
  • logs: Records the update history of references (such as branches and HEAD).
    • When you might encounter this issue: When executing git reset --hard or git rebase -i leads to the loss of commit records, you can read the records here via git reflog to perform a restoration.
  • objects: Stores all Git data objects (Blob, Tree, Commit).
    • Structure: Uses the first two characters of the SHA-1 hash as the directory and the remaining 38 characters as the filename.
    • Object generation mechanism: Executing a commit generates three types of objects:
      • Blob object: Stores the actual content of the file.
      • Tree object: Stores the directory structure and the corresponding Blob object hash values.
      • Commit object: Stores submission information (including Tree hash, previous commit hash, author, and message).
  • refs: Stores pointers for branches and tags.
    • heads: Stores local branches.
    • remotes: Stores remote branches.
    • tags: Stores tag names.

Key File Descriptions

  • COMMIT_EDITMSG: Records the message of the last commit. This file is opened for editing when executing git commit or git commit --amend.
  • config: Stores Git settings specific to the repository.
  • index: A binary file that records the snapshot after the latest commit and the file information added via git add.
  • HEAD: Records the currently checked-out branch or commit. If it points to a branch, the content is ref: refs/heads/branch-name; if in a detached HEAD state, it stores the Commit HASH directly.
  • ORIG_HEAD: Stores the state of HEAD before destructive operations like git reset or git merge, used for restoration.
  • FETCH_HEAD: Marks the record of each git fetch, formatted as follows:
    text
    {Commit SHA-1} [not-for-merge] branch '{branch name}' of {remote repository URL}
    • When you might encounter this issue: After git fetch is executed, if no merge behavior is triggered, the node will be marked as [not-for-merge].

The Essence of Branches and Version Control

As evidenced by Git's data structure, branches and tags are essentially just pointers to specific commit objects.

  • Branch: A pointer that automatically updates its target with every commit.
  • Tag: A pointer that points to a fixed commit object.

Git's history is linked together by the "previous commit HASH" stored inside commit objects. When executing git reset or git rebase, although the branch pointer moves, the old commit objects still exist in the objects folder, which is why history can be recovered via git reflog.


Change Log

  • 2024-07-31 Initial document creation.
  • 2024-09-20 Removed the description regarding .gitconfig in the root directory as it does not take effect, so it cannot be used for version control.