Reducing Large Repositories
Learn how to reduce the size of large Drupal or WordPress site repositories for optimized performance and reliability on Pantheon.
Contributors: Alex Fornuto.
Discuss in our Forum Discuss in SlackCaution
The content in this guide is advanced, and may not work in every case. For issues with cloning large repositories, you can simply clone the latest commit only using the depth
flag:
git clone --depth 1 ssh://codeserver.dev.xxx@codeserver.dev.xxx.drush.in:2222/~/repository.git my-site
Repositories that exceed 2GB may experience failures or degraded performance when interacting with code via Git on Pantheon. We recommend reducing the repository size by removing objects that are no longer referenced using git prune
, in addition to optimizing via git gc
. You may also want to review the repository for large files, then exclude them as needed.
Note
Due to the use of Perl and the Bash shell, the following process is supported on Linux and Mac machines only. Windows users should work within a virtual machine.
If your default shell is something other than Bash (Zsh, for example), switch to a Bash environment before you continue.
Determine Repository File Size
You can output the size of your repository by running git count-objects -vH
or du -sh .git/
from within the root directory of your site's codebase.
Prune and Optimize Large Repositories
Clone the site's codebase, if you haven't already.
Set the connection mode for each environment (excluding Test and Live) to git. You can do this with Terminus:
for i in $(terminus env:list $SITENAME --format=list | grep -v 'test|live'); do terminus connection:set $SITENAME.$i git; done
Navigate to the root directory of your site's codebase (e.g.
cd site-name
).Create local copies of all remote branches:
for BRANCH in `git branch -r | grep -vE "HEAD|master"`; do git branch --track ${BRANCH#origin/} $BRANCH; done
Write all local branch names to the
$BRANCHES
variable, to be used in later steps:BRANCHES=$(for BRANCH in $(git branch --list | grep -v master); do echo "${BRANCH}"; done; echo master)
Generate a list of large files existing on any branch, then write output to
../large_files.txt
:git rev-list $BRANCHES -- | while read rev; do git ls-tree -lr $rev | cut -c54- | grep -v '^ '; done | sort -u | perl -e ' while (<>) { chomp; @stuff=split("\t"); $sums{$stuff[1]} += $stuff[0]; } print "$sums{$_} $_\n" for (keys %sums); ' | sort -rn > ../large_files.txt
This may take several minutes to complete.
Review patterns that occur within
large_files.txt
and determine what should be excluded. Patterns may be a path to a single file, the path of a directory by name, or an expandable path.Example Patterns:
- Single file name:
myfile.txt
- Directory. This will also match on all files under that directory:
my_directory
- Expandable path pattern that matches all SQL files within
my_directory
:my_directory\/*.sql
- Single file name:
Filter out files and directories according to problematic patterns. In the example below, replace
my_directory\/*.sql myfile.txt
with the patterns you want to filter for:git filter-branch --force --index-filter 'git rm -rf --cached --ignore-unmatch my_directory\/*.sql myfile.txt' --prune-empty --tag-name-filter cat -- --all
This may take hours to complete.
Push your local changes to Pantheon:
git push origin --force --all git push origin --force --tags
In some scenarios,
git push origin --force --tags
may throw an error. Note that the following type of message is not an error:remote: PANTHEON NOTICE: remote: remote: The creation of tag "pantheon_test_9" has triggered a deployment of code on test. remote:
The current workaround is to delete the tags remotely using
git push origin :refs/tags/[tag]
Recover local disk space and optimize your local repository with the following:
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin git reflog expire --expire=now --all git gc --prune=now
Note
For sites using custom upstream, check the custom upstream to see if it contains large files that can be pruned.