You’re looking at your git repository – maybe because you want to back it up, or to move it to another location – and you freak out. 14,552 files? 153 megabytes? This can’t be right!
Or can it?
Type: | File folder |
Location: | C:\Users\Krishty\source\repos |
Size: | 153.4 MB (160,861,559 bytes) |
Size on disk: | 181.2 MB (189,956,096 bytes) |
Contains: | 14,552 Files, 882 Folders |
Here’s the radical diet for your repository …
This deletes your stashes! Also, make sure nobody else is writing to the repo!
rm objects/info/commit-graph git reflog expire --expire-unreachable=now --all git gc --aggressive --prune=now git repack -ad -F --depth=4095 --window=999 rm hooks/*.sample rm description rm gitk.cache
It sounds like a platitude, but it’s the foundation to any optimization: delete branches and commits when you’re sure you don’t need them any more.
This may not immediately make your repo any smaller, though. We’ll shortly see why.
Git uses garbage collection – like some programming languages (Java, C#, …). This means that it does not waste your time tidying up the repo during your normal day-to-day use. Rather, git waits until lots of garbage have accumulated before collecting it. This is also the reason for repos sometimes not shrinking right after large branches have been deleted.
The primary source of garbage is inaccessible commits. Commits can become inaccessible when they are undone (git reset). Amending commits (git commit --amend) orphans the original commit as well. After all, commits are identified by their checksum, and the checksum changes when you change a commit. You can probably imagine that rebasing leaves lots of garbage, as it potentially rewrites many commits.
Unreachable commits can easily be removed from the repository by running
git gc --prune=now
Make sure nobody else is writing to the repository during this time! --prune=now would then lead to damage.
This should already improve your repo size drastically.
Git repositories are compressed – but rarely in the optimal way.
One reason being, optimal compression is slow. The other reason being, compression is hindered by the garbage in the repository.
Now that all garbage has been removed from the repo, re-compress it entirely by running:
git gc --aggressive
This may take a long time – possibly minutes for medium-sized repos, and hours for gigabyte-sized repositories. Make sure that nobody else is writing to the repository during this time.
With the garbage collection step mentioned earlier, it can be combined into
git gc --aggressive --prune=now
The repo should now no longer consist of thousands of files. The compression should have re-packed it into a few dozen ones.
In earlier git versions, this was achieved by running the git repack command. The functionality has since been merged into git gc --aggressive.
If you really want to go to 11 and squeeze out the last bytes, you can run git repack to re-compress the repository with higher settings than git gc does:
git repack -ad -F --depth=4095 --window=999
In my experiments, this compressed at most 2 % better than git gc and took very long to complete.
You may notice that the command above still doesn’t remove all garbage from the repository. Even more confusing: If you retry the command a few weeks later, it may suddenly compress better!
Git maintains a reflog. This is a list of all commits that had been checked out quite recently (within the past two weeks, usually). This permits awesome commands like show me what I worked on ten days ago
(git show HEAD@{10.days.ago}). But it also prevents garbage collection from removing anything that is still referenced by the reflog.
Clear the reflog via:
git reflog expire --expire=all --expire-unreachable=now --all
This deletes your stashed changes!
Do this before garbage collection and re-compression.
Git supports hooks – scripts that are called on specific events, e.g. when pushing to a branch.
Git, by default, nicely places sample scripts in the hooks directory as a guideline for writing your own ones.
These are not used and serve no other purpose, so it doesn’t make sense to keep them in backups and deployments. You can delete all *.sample files from the hooks directory.
If you have ever used gitk, it probably created a cache file in your repository. This makes it run faster on subsequent starts, but it is not normally something you’d like to back up or publish.
To remove it, delete the gitk.cache file.
If you ever used the repository with Visual Studio, then you’ll probably find another cache: the commit graph.
It’s a quite new feature that has been added specifically to speed up displaying the commit graph. Again, it’s quite useful for day-to-day work, but not at all for backups or deployments.
To remove it, delete the objects/info/commit-graph file.
Git places a description file in any new repo by default. This feature is rarely used.
If you don’t use the description, feel free to delete the description file.
Before optimizing a repo, make sure nobody will be writing it during the optimization.
Skip the reflog step if you want to keep your stashed changes.
Garbage collection is relatively fast and reduces the repo size drastically, albeit only after clearing the reflog.
Re-compression is very slow, but reduces the number of files from several thousands to less than a dozen (for bare repos; a few more for other repos).
Some programs place their caches in the repo, and you can usually just delete them.
rm objects/info/commit-graph git reflog expire --expire-unreachable=now --all git gc --aggressive --prune=now rm hooks/*.sample rm description rm gitk.cache
Type: | File folder |
Location: | C:\Users\Krishty\source\repos |
Size: | 122.0 MB (127,926,272 bytes) |
Size on disk: | 122.1 MB (127,959,040 bytes) |
Contains: | 9 Files, 8 Folders |