Complete Guide to Permanently Removing Files from Git History
This guide explains how to permanently remove files from Git history using two popular methods: BFG Repo-Cleaner and git filter-repo. Both methods help you rewrite Git history to remove sensitive or unnecessary files while taking safety considerations into account.
Prerequisites
-
Backup Your Repository: Rewriting Git history is destructive and irreversible. Always create a backup:
git clone --mirror <repository-url> backup-repo
-
Understand the Impact: History rewriting changes commit hashes, which will affect collaborators. You'll need to force-push the rewritten history, and others will need to re-clone.
-
Install Required Tools:
- BFG Repo-Cleaner (opens in a new tab): A high-level tool to clean Git repositories.
- git filter-repo (opens in a new tab): A more flexible and advanced tool for rewriting Git history.
Method 1: Using BFG Repo-Cleaner
Step 1: Install BFG
Download the BFG jar file from here (opens in a new tab).
Step 2: Clone the Repository
Clone your repository as a bare repository:
git clone --mirror <repository-url>
cd <repository-name>.git
Step 3: Run BFG
Run BFG with the appropriate flags to remove unwanted files. Examples:
- Remove a specific file:
java -jar bfg.jar --delete-files <file-name>
- Remove files larger than a certain size:
java -jar bfg.jar --strip-blobs-bigger-than 100M
Step 4: Clean and Push
After running BFG, clean the repository and push the changes:
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --force
Method 2: Using git filter-repo
Step 1: Install git filter-repo
Install git filter-repo
using your package manager:
- For macOS:
brew install git-filter-repo
- For Linux (with pip):
pip install git-filter-repo
Step 2: Clone the Repository
Clone the repository locally (not as a bare repo):
git clone <repository-url>
cd <repository-name>
Step 3: Run git filter-repo
Run one of the following commands based on your needs:
- Remove a specific file:
git filter-repo --path <file-name> --invert-paths
- Remove files matching a pattern:
git filter-repo --path-glob '*.log' --invert-paths
- Remove blobs larger than a certain size:
git filter-repo --strip-blobs-bigger-than 100M
Step 4: Clean and Push
After the filter-repo process, force-push the cleaned history:
git push --force
Safety Considerations
-
Communicate with Collaborators: Inform collaborators about the history rewrite. They must re-clone the repository to avoid issues.
-
Double-Check Files to Remove: Verify which files are being removed to avoid accidentally deleting essential data.
-
Test the Modified Repository: Clone the rewritten repository into a separate directory and verify its integrity:
git clone <modified-repo-url> test-repo
-
Protect Repository Access: If sensitive data (e.g., passwords, API keys) was exposed, rotate the credentials immediately.
Key Differences Between BFG and git filter-repo
Feature | BFG Repo-Cleaner | git filter-repo |
---|---|---|
Ease of Use | High (simpler commands) | Medium (more customizable) |
Performance | Faster for simple operations | Optimized for complex use cases |
Customization | Limited | Extensive |
Installation | Requires Java | Python-based or native binary |
Troubleshooting
-
Error: Repository too large: If the repository is too large, consider using
--strip-blobs-bigger-than
to remove oversized files. -
Force-Pushing Issues: Ensure you have the necessary permissions to force-push to the remote repository.
-
Collaborator Issues: Share this guide with collaborators to help them re-clone and reset their local repositories.
Conclusion
Using BFG Repo-Cleaner or git filter-repo allows you to efficiently and permanently remove files from Git history. Always prioritize safety by backing up your repository and communicating with your team before making irreversible changes.