gzip vs dedup: I shrink, therefore I am
[reposted from rosensharma.wordpress.com]
I stole “I shrink, therefore I am” from my wife’s good friend Arun Verma, who is incredibly creative, and makes some of the best lamps ever. He also does websites and ads if you are interested.
I have a macbook and use vmware fusion to run a windows XP VM. I keep all my data on a hosted folder on the mac’s operating system. So the VM is basically programs and user settings. In addition I have several images which I work with: Red Hat Enterprise, Ubuntu, Win 2K3 etc. Not atypical of someone who either develops or tinkers with technology.
My problem is that out of a 120GB hard disk, I am upto 100GB, and a whopping 60GB of that is virtual images. I have about 8. So I wanted to see if I could compress the virtual images in some fashion. I decided to run a small test of how much dedup would buy me over gzip
w2k3.vhd: Original size: 1.6GB
w2k3.vhd.gz: 712 MB
Further Analysis of the image showed that there were
14K Zero Filled Blocks, and
About 40K blocks occurred more than once
gzip wxp.vhd –> 921 MB
43K Additional Blocks Repeated between this and previous image
Dedup Optimization: 66K*4K ~ 250MBClearly gzip would win over a simple dedup. Even with two images xp and w2k3 I guess there are just not enough blocks to make dedup shine. Less than 10% of the blocks are being found. Cloning in some sense avoids large matches in a small set of images like on the desktop.