Source control with Git Large File Storage

This is the first of hopefully many technical posts, we'll be talking about Unity, game dev and hopefully some retro game dev very soon.  So, to kick things off - let's start at the beginning with some source control...

Recently we have started a few projects which were complete PC games with a large footprint.  The first thing we do is put the project under source control and we've always used GitHub and GitLFS which have worked well.  However with large projects it is can be fairly common to see the dreaded:

"fatal: The remote end hung up unexpectedly"

I've seen various posts on the net about using SSH over HTTPS and setting the HTTP post buffer parameter to a large value:

git config http.postBuffer 524288000

However for me none of these work reliably in all cases.  As you know GitHub has a file size limit of 100MB, but you often encounter problems well before you get to files of that size.

So here are some basic things we do when first pushing a project to GitHub.

Start with a small initial commit

Just setup the README, .gitignore, and your lfs tracking (see below).The reason for this is simple, if you end up needing to rebase in future and everything is in the initial commit, you will soon discover you are stuck.  To rebase you need a parent and so you can't rebase the original commit.

At this point, don't forget to install GitLFS.

git lfs install

Split up large commits and push one at a time

Even if it does work a single giant commit is fairly unwieldy. There is nothing more frustrating that firing off a 'git push' only to have it fail 3% from the end after 20 minutes.  From my experience it's fairly intolerant of transferring large files, so if you missed off a 'git lfs track' or the connection hangs it's much easier if you don't have to start again from scratch and when there is a problem this helps you narrow it down.

So add files to commits in groups, either those that occur naturally or try alphabetically.  You don't need to go too mad on the granularity but a few reasonable sized chunks is all that is needed.

Find all the large files in your project and lfs track them first!

It's quite important to track all the large files in your project to begin with.  If you don't and you commit any large files by accident they'll get added to your git object database and bloat it.  Secondly if you miss large files your pushes will fail either when they get rejected or often when the transfer times out.

While You can rebase and track the files at a later date, that is painful and you end up having to clean out the blobs manually usually anyway.  Much better to get things under control from the start.

Using find and ls in bash, you can list all assets above a certain size: 

find .-type f -size +50M -exec ls {} \;

I'm sure with some sed and grep wizardry you can just list the extensions to track, but this is a good enough start.

You'll often notice examples or demos from Asset Store items in here if using Unity.  Might be time to prune the project unless you really need them eating up your storage space.

GitHub allows files up to 100MB but gives a warning over 50MB.  I usually look at anything over 10 or 20MB.  As well as any obvious binary only assets.

If you do make mistakes and attempt to clean up afterwards I have heard good things about the BFG Repo Cleaner. But have never used it myself.

Until next time...

Hopefully this will help you out when commiting an existing project to any Git based system with file size limits requiring you to use LFS.  As mentioned there will be more technical posts coming up soon including a post-mortem of sorts for The Adventure Pals, and some retro IBM EGA appreciation... so stay tuned.

- MT