AFP548 – Introduction to Git-Fat for Munki

Do you work on munki with a team of folks (meaning more than one)? You probably wish you could use MunkiAdmin for everything, but mounting the munki repo over the network may be a poor experience as things grow very large, and as the server world is mostly *nix (if you’re lucky), running MunkiAdmin on the server itself is out of the picture. It’s sheer luck that Hannes Juutilainen wants to help with all of our workflow needs, so he composed the original page on git usage for the official munki wiki. Git was made for a distributed, (latent) network-aware world, but as the wiki page takes into account (by instructing you to create a .gitignore file, which most GUI clients can easily generate for you), it wasn’t made to shuttle around large, compressed containers like DMGs and pkgs. If you’re pushing 1GB+ disk images with git as the transport mechanism, with every version copied to it’s internal storage…

Let’s take a step back. In an idealized view of things, you could

check out the current version of the ‘stable’ or ‘production’ repository (if you have multiple ‘environments’, meaning git branches, before munki changes hit production), even from scratch, in a timely fashion
Use any text editor or client like MunkiAdmin to add/modify the munki repo locally in a topic branch, and
push things back up to an internal git ‘remote’, alerting the responsible admin to review and merge changes before ‘pulling’ it into your production vhost.

If you run makecatalogs on a repo without the pkgs, however, you’re greeted with a bunch of warnings, and syncing all of them down could be impractical if not time-consuming. And that’s not the only workflow issue I wanted to address…

Before I go on, it’s important to reiterate that solutions do exist which are definitely worth evaluating, like Mandrill by Joe Wollard. It presents an entirely web-based, access-controlled, one-stop-shop to allow multiple folks to collaborate with versioning ‘built-in’. MunkiServer has been around for a bit, and similarly allows multi-group collaboration. Sal(specifically the ‘+’ version), by pebble.it and actively developed by Graham Gilbert, has added repo modifications to the hosted/paid/supported version. MunkiWebAdmin has been in use alongside other solutions for quite some time, and being a django-based web app, many folks have forked and extended it for their own needs. Simian, by Google, is also cloud-based, and is likewise self-contained and meant to be collaborative.

But if you’re into putting together the pieces yourself, a solution to the problem of ‘how do I move these packages around before/once I’ve made a change?’ could be git hooks. Upon checkout, or after committing a change to any particular branch and pushing, you can tell git to run rsync. Emailing the team upon a change is one of the tips Hannes included in the munki wiki page, but just like you need to figure out that implementation, you’d want to come up with the process of tuning rsync to perform how you’d like. That alone wasn’t an attractive option for me, however, as I was sure solutions for this issue of large binary blobs in git had to have been attempted already.

One of those that I experimented with is git-annex, which is… crazy. Crazy POWERFUL, but still, crazy. My reservations about it was that it leaves the one copy of everything in your internal .git repo, with symlinks where you’d be serving the actual files. I didn’t want to have to think about tuning the overall performance of symlinked files in the munki repo, so this wasn’t attractive for me.

Sam Keeley clued me in (as he is the source of many of my better practices) to a project called git-fat, another way to tackle this problem directly in a git workflow. There are barely any dependencies, and its implementation checks off the first two steps in the collaboration steps I mentioned above. An admin can check out the git repo in seconds (even though we’re well over 10GBs in pkgs with hundreds of high-quality png icons/client resources), and there are placeholder files that trick makecatalogs into running without warnings. In the words of Pee Wee’s Big Adventure, though,

https://www.flickr.com/photos/tmray02/2522629742/

‘Everyone I know has a big “But…?’ – git-fat duplicates the files it manages on your behalf into its internal .git/fat/objects directory. This would make things ungainly and wasteful on the web host storage we serve munki from. I scratched my head and went to battle with git’s –filter-branch options, thinking I could clean up the objects post-pull, but git-fat wasn’t meant to allow purging the old versions. I looked at building the feature I needed into git-fat’s source, but it sunk my scrabbleship (the code, to my still novice python eyes, made no sense).

Git hooks came to the rescue again, however, as you can specify a command to run post-merge (appropriate because git pull actually runs a merge internally as its last step once fetched): in this case, on the vhost, I just wanted to overwrite the internal objects git-fat uses for accounting to zero bytes, but leave them as placeholders before copying in the new changes via ‘git fat pull’.

(Hint: for i in .git/fat/objects/*; do chmod 700 $i; cp /dev/null $i; done )

In an upcoming article, I’ll walk though the steps I used to convert an existing repo, with the icons, pkgs, and client_resources directories .gitignore’d, to now have git-fat manage everything for us.