Entrepreneurs and Startups: 10 Lessons in Entrepreneurship
Om Malik’s post The Essential Startup Reader: 10 Lessons in Entrepreneurship is a collection of insightful essays about what makes entrepreneurs and startups tick. It is a must read.
how “capitalism” forces virtualization downstream
Over the last decade we have systematically added a layer of indirection at every interface in the stack. These days we call this virtualization!
On a NetApp Filer we had
> raid disk group > volume
The problem was you could not expand/shrink Raid Groups on the fly, you couldn’t move data easily between different Raid Groups. We get a layer of indirection or virtualization
> raid disk group > aggregate > flex volume>
Since an aggregate was logical instead of physical, it could be expanded or shrunk without changing the volume, you could move data around.
On a USB Disk
If we look inside the disk itself, especially usb flash devices we went from
cylinder, heads, sector > logical table > device abstraction
Again this allowed the rotation of different logical sectors to different physical cells, to ensure a single cell was not rewritten more times than its lifetime.
In a SAN
we put a switch between the Raid Groups and the computer. The switch puts a layer of indirection between the blocks and the computer
You knew all that
. So what does it have to do with capitalism. My simplistic definition of capitalism is that the system will remove all inefficiencies in a chain and who ever will remove them stands to benefit economically. Or said another way: money finds its way into the right pockets!
So look at the stack today:
chips > motherboard, network, storage, bios > hypervisor > OS > Security, Backup etc > Business, Productivity Apps
Every layer presents an interface to the layer above. Each layer is also owned by different companies in the eco-system. Each of those companies has pressure to maximize its revenue. Tasked with this difficult challenge, you look at the layer above and see what is selling and can you add it to your layer. Happens naturally over time: intel added virtulization support, phoenix bios is adding the hypervisor, operating systems are trying to add backup and security …. The cycle goes on ….
Virtualization will be “innovated” always in a higher layer of the stack and commoditized by the lower layers.
The higher layer in the stack finds a lot of new functionality and benefit by making interface to a lower layer “logical”. They take this to market, till at some point the lower layer realizes that this is their API, they should move virtualization into their layer. The pressure to do this is extreme and the time frame to monetize this really small:
- Imagine the tussle between VMW and the storage vendors. VMW introduces logical disks with cloning, but storage vendors want to offer logical luns and volumes and disk files, as this moves the cloning functionality from the hypervisor to the storage.
- Imagine: Western Digital or Seagate could create multiple disks (vhd/vmdk files) on a single physical disk and then offer the capabilities to grow, shrink, move data between them. Even add networking to the disk controller, then different disks can connect to each other. They can do that if the processing power, memory reach a price point that it can be embedded directly into the component or lower layer. Which is what effectively happened to computing.
- VMW introduces logical network switch, Cisco jumps in with nexus-V
For a consumer this is a good thing, but money and value are shifting down stack across different companies, which have to co-exist in the eco-system (cisco, intel, emc, vmw), yet guard their innovation from becoming commoditized.
EMC FAST Fully Automated Storage Tiering for storage savings
Chuck Hollis, VP Global Marketing, CTO, EMC, describes FAST over 3 blog posts. The technology has been in Beta usage by several customers in 2009.
The premise
When you analyze the vast majority of application I/O profiles, you’ll realize that a small amount of data is responsible for the majority of I/Os; almost all of it is infrequently accessed.
The principle
Watch how the data is being accessed, and dynamically cache the most popular/ frequently accessed data on flash drives, usually the small amount, and the vast majority of infrequently accessed data on big, slow SATA drives.
The storage savings solution
| FAST | Place the right information on the right media based on frequency of access |
| Thin | This (virtual) provisioning allocate physical storage when it is actually being used, rather than when it is provisioned. |
| Small | Compression, single-instancing and data deduplication technologies eliminate information redundancies. |
| Green | A significant amount of enterprise information is used *very* infrequently. So infrequently, in fact, that the disk drives can be spun down, or at the least be made semi-idle. |
| Gone | Policy-based lifecycle management – Archiving and Deletion, Federation to the cloud through private and public cloud integration. The information can get shopped to a specialized service provider as an option |
… and life goes on!
One thing hasn’t changed, though. The information beast continues to grow
Virtualbox branching / branched snapshots
An illustrated, step-by-step procedure for branching snapshots using Virtualbox 3.1.0 is available here.
Roman Kennke’s post on branching snapshots describes the feature and provides a how-to guide:
A typical use case would be to install an OS into a virtual disk, make that virtual disk read-only and use it as base image for several branches.
- For example, in one branch I would do testing/debugging of stuff that I develop. There might be several branches I use for testing.
- Then I might need a branch in which I install a build environment for OpenJDK, which could in turn be used for several more sub-branches for OpenJDK6 builds and OpenJDK7 builds.
- In another branch off the base image I would run tax software. Etc
Top 7 requirements for infrastructure cloud providers in 2010
This is a summary of the post on the VMOps blog.
1) Inexpensive storage
The storage industry is built on the back of NAS and SAN, but for cloud providers, the overwhelming preference is for inexpensive local disk, or DAS solutions. … every cloud provider I talk with expects storage to be independent of the host physical server, redundant, and provide support for HA.
2) Open source hypervisor
Service providers know that if they plan to compete with Amazon, Rackspace and other cloud providers, on price, VMware is not a good option. Perhaps because it is being used by Amazon, Xen seems to be the most popular hypervisor for Infrastructure clouds among the service providers
3) Integration with Billing and Provisioning Apps
… most hosting companies and MSPs have billing and user management approaches that they have built-up over the years. Every one of the companies I’ve spoken with expect their cloud solution to plug into these existing systems.
4) Image-based pricing to support both Windows and Linux
Most service providers I talk to expect Linux to make up the majority of the images they run int he cloud, but they still need to make sure the cloud will support Windows, and all of the associated technology necessary to manage licenses.
5) Simplicity of administration by end0users
Plenty of end-users will leverage a Clouds API to automatically provision and manage virtual machines, but that doesn’t change the need for a simple UI. Most hosting companies have a huge number of end-users who are used to working with control panels, and an Infrastructure cloud needs to make life easy for these end-users.
6) Reliability
Over the next few years, many of the large providers of dedicated servers will be offering their customers the option to transition to virtual machines running on a computing cloud. For this to be successful, VMs need to offer better reliability than dedicated machines at a lower cost.
7) Turn-key solution
… service providers today can implement a completely integrated cloud stack on commodity hardware, and receive ongoing maintenance and upgrades over the years. Equally important, service providers can license software on a consumption basis, so upfront investment is negligible.
Incidentally, Mr. VMOps Product Manager, you may wish to provide just 3 more requirements to make this a Top 10 requirements list.
Cloud storage predictions for 2010
Detailed post by Sajai Krishnan, CEO, ParaScale is on David Marshall’s VMBlog. The key ideas are summarized below:
The advent of cloud computing has given rise to several cloud storage vendors.
1) The cloud starts to get described
Vendors will begin to describe concrete features and benefits of their product offerings
2) Commodity hardware starts to displace proprietary storage
While all storage vendors claim to use commodity hardware, in reality they are all essentially closed solutions qualified on two or three commodity boxes. Customers are locked into stovepipes with little ability to truly benefit from Moore’s law by selecting from the thousands of commodity servers available at any given point and at multiple points of purchase.
3) Server Virtualization will drive Private Cloud Storage adoption in the Enterprises
With server virtualization, organizations are free to take advantage of low-cost commodity hardware and aren’t tied to proprietary linkage of the OS and the hardware platform. The weak link today is the storage infrastructure behind virtualized servers.
4) A storage middle tier will emerge
The strategic importance of a low-cost, self-managing, petabyte scale tier that provides a platform for analysis and integrated applications emerges in organizations with large stores of file data. These organizations that are investing heavily in new tier1 storage and moving aged data to archive will experiment with a middle tier that leverages low cost commodity hardware and provides read/write access. This middle tier will provide opportunity for administrators to automate storage management and optimize for performance and cost, but at a much lower expense. This middle tier will also support large scale analysis while eliminating related data migration and administrative tasks. The emerging middle tier will also provide an integration layer with service provider cloud offerings. The similar architectures enable "cloud bursting," the seamless ability for service providers to offer spillover capacity and compute to enterprises.
5) Opex, not Capex will emerge as the most important criteria driving storage purchases
Maintenance costs on existing gear will be under heavy review with the emergence of commodity-based hardware storage options.
A sysadmin’s DAS to Netapp NAS migration experience
This is from Andy Leonard’s post
I work for a relatively small, but growing, research non-profit. When last I measured it, our data use was growing at a compound rate of about 8% each month; in other words, we double our storage use every nine months or so. (As we’re in the midst of a P2V project where direct-attached storage is moving to our NetApps, we’re actually growing faster than that now, but that’s a temporary bump.) We already have multi-terabyte volumes – so, you do the math… the 16TB aggregate limit (of the 2020) is a real problem for sites like us.
Storage Math
It’s also worth noting that a 16TB aggregate is not a 16TB file system available to a server. 750GB SATA drives become Rightsize 621 GB drives. Then, for RAID-DP, subtract two disks out of each RAID group. Next, there’s the 10% WAFL overhead. And don’t forget to translate from marketing GB to real GB (or GB to GiB, if you will). So that maximum-size 26-disk aggregate made up of 750GB drives winds up as 11.4TB. And – of course – don’t forget your snap reserves after that.
Backups
As you mention, backups could be a challenge for large volumes; here’s how we solve it: The 2020 in question was purchased as a SnapVault secondary. Backups go from our primary 3040s to it, and then go via NDMP to tape for off-site/DR purposes. The secondary tier gives us the extended backup window we need to get the data to tape and meet our DR requirements. (I actually think this is a pretty common setup in this day and age.)
Archiving
Of course, I’m not naive enough to think we can grow by adding drive shelves indefinitely (just added another one last Friday…). My personal opinion is that we’ll ultimately move to an HSM system, especially since much of the storage is used for instrument data (mass spec, microscopy, etc.) that is often difficult for researchers to categorize immediately as to its value. The thought is to let the HSM algorithms find the appropriate tier for the data automatically.