- Published on
How to Use the "git-filter-repo" tool with a Downloaded Script to Clean up a Big Git Repository
- Authors

- Name
- nikUnique

The Problem
Sometimes, we find ourselves in a situation where our repository becomes large after we commit something that should not be committed at all. Something like big media files. I definitely was in this situation, and I have a solution here, which is the default recommended solution for this problem.
The Solution
First, we need to download a script. There are other ways to use the git-filter-repo tool, but I find this one to be a nice option; if you do not want to install it into your system. You can find this repo; it is in the "Simple installation" part. Name it git-filter-repo.py.
How would you use it? For simplicity, let's place it in the root directory of your project where you want to use it. This way to reference it, you will only have to write it like this: ./git-filter-repo.py. There are a couple of use cases for this, and I found a resource that shows how to use it. So far, I have been only interested in the cleaning part. This leads us to the next question: what do we want to clean up? Likely, you already know what files you want to remove, but we have one more nice resource where we can find a command that shows us the top 10 biggest files. Here is the blog article of Junyong Lee, and I find it really helpful. Although in that blog post we can see how to use the git-filter-branch, which we are not interested in, all other commands above and below that usage of the git-filter-branch command are still needed by us. I would recommend using the command to check the top 10 largest files ordered by size. This is from that last resource I mentioned. Here it goes:
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)
Now we know the biggest files in the repo. Therefore, it is time to remove these files. This is how we can do it:
python3 ./git-filter-repo.py --path secrets.txt --invert-paths
If you want to delete multiple files at once, then just do --path file1 --path file2 in the command. You can also delete the entire directory.
The story doesn't end here. The repo's size is still the same. To make it smaller, we need to do some cleanup:
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc --aggressive --prune=now
With this, we remove logs and objects for the old commits that are no longer referenced by the rewritten commits. At this point, the repo's size should be decreased.
I personally used both git-filter-repo and git-filter-branch, and the second one may take your computer memory into oblivion, I mean, consume too much of it. This happened to me when I used git-filter-branch to clean up a repo that had about 15GB in size.
And the final part is to push it to the remote repo with a --force flag.
git push origin master --force // I assume that your branch is called master
Here you have it! This is how you can make use of the downloaded git-filter-repo script to reduce the size of your repo.
Got questions? Send me an email to commitnobug@outlook.com.