Posts Tagged ‘file transfer’
DropBox dedup only in the cloud
I had observed in my earlier article that DropBox performs de-duplication in the cloud. This would mean that de-duplication is not performed at the client. In order to test my hypothesis, I performed the following experiment:
I first looked at the size of the DropBox folder on Windows and found it to be 1,723,871,232 bytes.
Next, in the DropBox client, I opened the DropBox folder and simply duplicated the contents of the Public folder by copying the 1.68MB file and pasting it as its copy. I looked at the size of the folder once again and it had doubled to 3,446,513,664 bytes.
If DropBox had been performing dedup at the client, then it should have detected the duplicate blocks between the parent and its copy at source and the folder should not have grown in size at all. As a result, my conclusion is that DropBox dedup’s only in the cloud but not at the client.
Wait, there’s more:
I repeated the same experiment on the Mac after deleting the duplicate file. Here’s what I started out with:
Last login: Thu Sep 17 15:30:58 on ttys000
mace1s:~ paule1s$ du -k DropBox
1152 DropBox/Photos/Sample Album
1516 DropBox/Photos
1682636 DropBox/Public
368 DropBox/sharevm
1684880 DropBox
Notice that the total size of the folder (the last line of the listing above) is 1.68GB.
Next, in the DropBox client, I opened the DropBox folder and simply duplicated the contents of the Public folder by copying the 1.68MB file and pasting it as its copy. I looked at the size of the folder once again and saw:
mace1s:~ paule1s$ du -k DropBox
1152 DropBox/Photos/Sample Album
1516 DropBox/Photos
2600140 DropBox/Public
368 DropBox/sharevm
2602384 DropBox
This is very interesting. I had expected the storage requirements to double to 3,369,760 however, they grew by approx. 1GB. What happened to the remaining 682MB? Did the DropBox client truncate the file? If so, why?
Readers, can you shed some light?
Compressed VM file transfer using DropBox
I am using DropBox for transferring compressed files including VM’s between my environment at home, a Mac running Windows XP SP3 in VMware Fusion 2.0.5 and the test machine, a Windows XP SP3 system located in the office lab. Each machines has a DropBox folder linked to the same account.
Neat product!
I love the simplicity and ease of use. A lot of thought has gone into making the product easy to install, the integration with the host OS (Windows and Mac) is seamless and sets a benchmark for how UI’s for downloadable products should be designed.
Usage model
I compress each file using the Mac’s native file compression and drop into into my DropBox folder. DropBox seems to follow a two-step file transfer process:
- It first uploads the file completely from the source DropBox folder to the DropBox folder in the cloud
- After the upload is complete, the file is then downloaded from the DropBox folder in the cloud to the destination DropBox folders.
Setup
Speed ratings are from here. I have been able to correlate these speeds with the end-to-end transfer times.
|
Transfer Type |
Speed Rating for my ISP |
Observed DropBox Transfer Rate |
|
Upload |
120 KB/sec |
70 KB/sec |
|
Download |
360 KB/sec |
210 KB/sec |
Near real-time transfer for uncompressed files
DropBox transfers uncompressed files almost instantaneously between the two machines. The files are transferred sequentially and seem to arrive in order. For example, I transferred a 1.72 GB folder containing 400 photographs and the photos started appearing sequentially 10 – 15 seconds apart.
Compressed files
Compressed files are transferred as a unit, although dedup applies to blocks contained within it. The transfer times are as recorded below:
|
Original Size |
Compressed Size |
Upload Time |
Download Time |
Total Time |
|
4.30 GB |
1.6800 GB |
6h 40m |
2h 12m |
8h 52m |
|
2.15 GB |
0.6714 GB |
2h 27m |
0h 48m |
3h 15m |
|
1.10 GB |
0.2371 GB |
0h 56m |
0h 18m |
1h 14m |
Dedup works well with compressed files
DropBox examines the file to be transferred and builds an index of blocks to be transferred. Its de-duplication technology is smart enough to figure out when not to transfer blocks that are duplicates, i.e., have already been transferred before. For example, when I tried to transfer two clones, the first one took a long time to transfer ( a few hours), but the second transfer was very rapid (under five minutes).
Since I am using the free account, I deleted a 2GB VM from my DropBox folder in order to begin my next transfer. I was pleasantly surprised to see that the next VM transfer was very rapid. I suspect this was because the VM that was transferred earlier was still residing in DropBox’s cache even though I had deleted it, so that DropBox discovered common/duplicate blocks and did not upload them from my Mac.
Summary
Nifty tool. Love it. Will use it a lot.
A few feature requests
- Subfolders: I would like to organize the files by date and category.
- Timers: I would like to time the uploads and downloads easily.
- Profile my usage and suggest how long an end-to-end transfer will take
- Speed up compressed file transfers – improve my effective transfer rate from ~60% to ~80%- I would like to saturate the available bandwidth for uploads and downloads
Thanks
gzip vs dedup: I shrink, therefore I am
[reposted from rosensharma.wordpress.com]
I stole “I shrink, therefore I am” from my wife’s good friend Arun Verma, who is incredibly creative, and makes some of the best lamps ever. He also does websites and ads if you are interested.
I have a macbook and use vmware fusion to run a windows XP VM. I keep all my data on a hosted folder on the mac’s operating system. So the VM is basically programs and user settings. In addition I have several images which I work with: Red Hat Enterprise, Ubuntu, Win 2K3 etc. Not atypical of someone who either develops or tinkers with technology.
My problem is that out of a 120GB hard disk, I am upto 100GB, and a whopping 60GB of that is virtual images. I have about 8. So I wanted to see if I could compress the virtual images in some fashion. I decided to run a small test of how much dedup would buy me over gzip
w2k3.vhd: Original size: 1.6GB
w2k3.vhd.gz: 712 MB
Further Analysis of the image showed that there were
14K Zero Filled Blocks, and
About 40K blocks occurred more than once
gzip wxp.vhd –> 921 MB
43K Additional Blocks Repeated between this and previous image
Dedup Optimization: 66K*4K ~ 250MBClearly gzip would win over a simple dedup. Even with two images xp and w2k3 I guess there are just not enough blocks to make dedup shine. Less than 10% of the blocks are being found. Cloning in some sense avoids large matches in a small set of images like on the desktop.
Virtual disk (VM) transfers in the cloud

There are two sets of use cases:
- Within a development team
- Within IT
Development teams:
Developers carry between one to three VM’s on their laptops. They often transfer them to other developers/QA Engineers in their own team, or other teams for integration testing.
IT (regular file transfer, no streaming):
IT receives a VM that is packaged and ready for deployment – either developed by an in-house/contracting application development team, or buys it from an external vendor.
The VM is transferred to a staging (pre-production) fileshare from which it can be loaded on to one or more test servers.
When the app within the VM passes acceptance tests, it is transferred to a production fileshare, from which it can be loaded on to one or more production servers.
The VM can also be transferred to archival storage.
DropBox: Cloud service for storing, syncing, sharing files
I found Dropbox, a nifty service for storing files online, keeping their copies on several of your own computers in sync, or sharing some of them with your friends.
- You download the Dropbox client (supported on Windows XP and Vista (32 and 64-bit), Mac OS X Tiger and Leopard, as well as Ubuntu 7.10+ and Fedora Core 9+)
- 2GB of free storage provided with it
- You can then drag and drop files that you want to store online or share into the Dropbox.
- Dropbox maintains a snapshot of files
- If any of the files get updated, it sends only blocks that have changed
- It also offers the ability to undelete and restore files from the copies that are stored online.
- You can create Public folders for sharing, files in Public folders have URL’s that you can share with your friends.
While the company seems to be consumer-focused, the service is usable for dull and boring corporate stuff, like instantaneous automatic backups of files that change and also enables disaster recovery.
Someone has used Dropbox for syncing and sharing VM‘s. This is an interesting use case, however, readers should pay heed to the transfer times as image sizes grow
Top 10 referrers for Q1 2009
Here are the Top 12 referrers to our blog over the past 3 months, the numbers of referrals are in parentheses.
- http://pro-linux.de/berichte/ext4/ext4.html (765)
- http://networksecuritytoolkit.org/nst/index.html (566)
- http://dabcc.com/article.aspx?id=9653 (149)
- http://polishlinux.org/apps/cli/ext4-defragmentation-with-e4defrag/ (111)
- http://kakku.wordpress.com/2008/06/23/virtualbox-shrink-your-vdi-images-space-occupied-disk-size/ (101)
- http://stumbleupon.com/refer.php?url=http://sharevm.wordpress.com/2009/01/19/most-popular-vmware-virtual-appliances-for-it-administrators/ (84)
- http://techblog.41concepts.com/2008/03/31/shrink-your-windows-disk-image-on-wmware-fusion-mac/ (67)
- http://thedarkmaster.wordpress.com/2007/03/12/vmware-virtual-machine-to-virtual-box-conversion-how-to/ (66)
- http://blogs.msdn.com/heaths/archive/2005/07/30/445621.aspx (66)
- http://prefetch.net/blog/index.php/2007/01/21/determining-file-fragmentation-on-ext3-file-systems/ (61)
- http://virtualgeek.typepad.com/virtual_geek/2009/01/updated-homebrew-esx-hardware-list.html (52)
- http://blog.rightscale.com/2009/01/09/amazon-launches-ec2-console/ (53)
Thank you for the referrals. Hope the content is meaningful for our readers
Top 12 referrers over the past 3 months
Here are the Top 12 referrers to our blog over the past 3 months, the numbers of referrals are in parentheses.
- http://pro-linux.de/berichte/ext4/ext4.html (546)
- http://dabcc.com/article.aspx?id=9653 (342)
- http://networksecuritytoolkit.org/nst/index.html (110)
- http://polishlinux.org/apps/cli/ext4-defragmentation-with-e4defrag/ (59)
- http://communities.vmware.com/thread/189804?tstart=0 (49)
- http://techblog.41concepts.com/2008/03/31/shrink-your-windows-disk-image-on-wmware-fusion-mac/ (42)
- http://blog.rightscale.com/2009/01/09/amazon-launches-ec2-console/ (37)
- http://wordpress.com/tag/vhd/ (33)
- http://wordpress.com/tag/vmdk/ (32)
- http://virtualgeek.typepad.com/virtual_geek/2009/01/updated-homebrew-esx-hardware-list.html (32)
- http://blogs.msdn.com/heaths/archive/2005/07/30/445621.aspx (32)
- http://kakku.wordpress.com/2008/06/23/virtualbox-shrink-your-vdi-images-space-occupied-disk-size/ (31)
Thank you for the referrals. Hope the content is meaningful for our readers.
How long does it take to copy a VM over the Internet?
Most of us who have copied large files over the Internet (or even within the company’s network for that matter) have been surprised by the amount of time it takes. We dont have a mapping in our mind that can relate the network bandwidth in Mbps with the size of the file to be transferred (GB). I found a nifty file transfer time calculator on the Internet and computed the effective transfer rate for several different bandwidth options by subtracting the signal overhead and an average 5% Layer 4 overhead from the rated capacity. The table below shows the expected transfer times for different network bandwidths, your mileage will vary with the Internet/internal network traffic at your site.
|
|
Transfer Time |
|||
|
File Size (GB) |
1.544 Mbps T1/DS1 |
2.088 Mbps E1 |
10 Mbps Thin Ethernet |
44.736 Mbps T3/DS3 |
|
1 |
1h 44m 35s |
1h 16m 02s |
0h 14m 19s |
0h 03m 09s |
|
5 |
8h 42m 57s |
6h 20m 10s |
1h 11m 37s |
0h 15m 45s |
|
10 |
17h 25m 55s |
12h 40m 20s |
02h 23m 15s |
00h 31m 31s |
|
15 |
26h 08m 52s |
19h 00m 31s |
03h 34m 52s |
00h 47m 16s |
|
20 |
34h 51m 50s |
25h 20m 41s |
04h 46m 30s |
01h 03m 02s |
The timings for T1 correlate very closely with my experience of transferring VM’s between my machine and ec2 when I am the sole user. It takes me over half a day to transfer a 10Gb VM and this is really wearing me down.
The chronic problem is how to reduce the time taken to copy VM’s to a remote datacanter, either one owned by your company or a cloud provider, over your company’s private network or on the Internet. Neitwork bandwidth will always remain a gating issue for most of us who are developers.
scp, VSS for Windows VHD backup, disaster recovery
How to replicate Hyper-V VHDs for DR?
The author is looking for a block-level transfer tool like rsync on Windows. A respondent has suggested using Volume Shadow Copy Services (VSS). However, VSS needs space for shadow copy data and this becomes an issue if you have large VM’s to transfer and are short of space.
scp is a secure transfer tool like rsync that is used for performing remote copies of files, including vhd mages across the LAN or the Internet.
Mircosoft recommends that customers can implement a backup or a disater recovery solution within their WAN using the File Replication Services (FRS2) of the Distributed File System (DFS) of Windows 2003 server R2, or later. This solution will perform well over the WAN, if WAN acceleration is in use. However, if WAN acceleration is not in use, then they should enable the Remote Differential Compression (RDC) protocol available in Windows 2003 Server R2, which Optimizes File Replication over Limited-Bandwidth Networks
Read about the use of rsync for vmdk, vhd backup and disaster recovery.
rsync vm, vhd for backup, disaster recovery, ec2
I use ftp to transfer large VM image of my code to a remote development team based in India and rsync for copying and backing up code, configuration and data from ec2. I researched the web for best practices that have evolved for speeding up large VM transfers. It seems there are none today, unless you are transferring VM’s on your company’s WAN and they are using WAN accleration to improve the transfer rate. However, I have found two models for using rsync with vmdk’s and vhd’s. Here’s a sample of use cases:
Cloud-centric usage
rsync is used for copying and backing up code, configuration and data from cloud-based services like Amazon ec2.
- Here is an excellent guide to Backup with rsync and ec2
- rsync, s3 and backups
- Migrate to aws/ec2
Traditional usage
rsync is used for backing up large VM’s to a remote store or for disaster recovery
Read about Backup, Disaster Recovery for Windows VM’s