Hubert Feyrer posted, Musing about git’s object store efficiency yesterday. In it he compared the apparent efficiency of git’s object store to CVS’s stacked patches. His methodology was to checkout all 963 versions of the NetBSD i386 GENERIC kernel configuration file and then sum up the space used. He comes to the following conclusion:
the git model requires about 37 times the space that CVS does
that’s not counting the overhead of 962 inodes and the related directory bookkeeping
He finishes off with an acknowledgement that git has data packing features:
I know that git offers some more efficient storage methods via “pack” files, but investigating those is left as an exercise to the reader.
I generally enjoy Hubert’s posts but as a daily user of git this one didn’t sit right with me. I thought I’d take up the aforementioned exercise.
I retrieved the GENERIC,v rcs file1 and created a git repository2.
I then ran a script3, which committed each revision of the file along with a single line commit message extracted from the rcs log.
The repository then weighed in at 22,352kb4 with 3,174 files and directories5. This is where
git-gc comes in. From the man page, “git-gc - Cleanup unnecessary files and optimize the local repository”. After running
git gc6 the size of the repository was down to 1,068kb, 1.24 times the rcs file. The file and directory count also vastly smaller at 64.
So all in all git fares pretty well. Sure the repository is bigger than CVS and there’s a few more files but its not in the order Hubert suggests and its a small price to pay for all the benefits git provides.
1. From ftp.netbsd.org.
mkdir git cd git git init Initialized empty Git repository in /Users/wmoore/Source/NetBSD i386 GENERIC/git/.git/
4. Repository sizes detemined via:
du -sk .
5. File and directory counts determined via:
find . | wc -l
git gc Counting objects: 2871, done. Delta compression using up to 2 threads. Compressing objects: 100% (1914/1914), done. Writing objects: 100% (2871/2871), done. Total 2871 (delta 951), reused 0 (delta 0) Removing duplicate objects: 100% (256/256), done.
Testing performed on Mac OS X 10.6.2 with git 220.127.116.11