Programmer Thoughts

By John Dickinson

A Vision of the Future of Storage

May 22, 2011

Businesses are beginning to embrace “The Cloud” as a cost-effective way to offload internal compute and storage needs. However, many businesses still keep data in-house. Unfortunately, most of this data is useful when one can compute on it, and transfer costs to move the data to the compute centers can be cost-prohibitive. This implies that by storing data in-house, businesses also must host compute resources in-house.

Therefore, in an effort to save money, businesses have searched for ways to “burst into the cloud”–a marketing phrase which simply means temporarily using someone else’s public compute infrastructure to do some work and only paying for what is used. Unfortunately, this presents difficulty when it comes to storage. The data lives in-house. “Bursting” doesn’t make sense for storage.

There is an answer: federated storage.

Federated storage is where the storage cluster becomes a network of storage clusters. Upload data to a storage endpoint. Metadata on your data provides rules as to where the data should be stored. For example, rules can be set to require that the data is stored in geographically distinct regions. Other rules can be set to require that data is stored with at least two companies. This gives you both geographic and business redundancy. If an earthquake knocks out a data center or a company goes out of business, the data is protected.

Storage providers can enter into peering relationships that allow them to offer a storage network to their customers. For example, imagine that Dell, Rackspace, Internap, and ATT all have storage clusters, each with their own customers. Each company has different capacity and locations available. If these companies entered in to a federated storage agreement, they could share customers, capacity, and geographic redundancy.

This federated storage network is similar to how airlines organize today. American Airlines and British Airlines service different parts of the world, but they are both part of the One World network that allows passengers to collect rewards with one company even when using the partner airlines.

In the same way, smaller companies like Internap and Rackspace can partner with larger companies like ATT and Dell. A Rackspace customer can choose to store data both in one of Rackspace’s data centers and also in geographic regions that Rackspace does not serve. Perhaps Dell builds a data center in South Asia. Rackspace customers would then be able to store data redundantly in both Rackspace’s data center in Chicago and also in Dell’s data center on the other side of the world.

Other storage federations could also arise. Users then are able to choose where to store their data based on the entire offering of the partnership of providers rather than on one company’s merits. Companies benefit by having access to infrastructure without the need for expensive cap-ex costs. Customers benefit by having more storage options.

As another benefit to customers and businesses alike, a storage federation also implies that data is portable. Although data is sticky, customers would be able to move their data from one company or federation to another using the same API that the companies within the federation use to manage the federation itself. At a provider level, storage could be temporarily moved to another system for things like planned downtime or unplanned outages.

However, if data is portable between companies, this implies that the companies must compete on something other than simply having the infrastructure. A smaller company could join a federation and suddenly allow their customers access to a previously unavailable network of storage. Companies would need to differentiate on service, user interface, price, or myriad other things other than the availability of the storage itself. Of course, these federations would be governed by contracts established by the cooperating companies, and, so, the specifics of any one federation would be unique.

Federated storage networks become very interesting when the associated compute resources are accounted for. Rules for storage could take into account the available compute resources in particular locations or with particular providers. Using the companies from the example above, ATT may provide a large number of available compute resources, but Dell may provide particular types of compute–GPU vector processing, for example. A Rackspace customer, then, can choose to make their backups of their Rackspace-hosted web application stored in Rackspace’s storage infrastructure but the generated data could be stored with a rule requiring a copy be made available in ATT’s storage network near ATT’s large availability of compute resources. Or, perhaps a Dell customer could require that data be made available only within Internap’s storage infrastructure because of some particular government regulation Internap meets.

Federated storage allows data to break free of a single company. Customers gain more control over their data by being able to choose where their data is stored. Businesses gain the ability to store and process their data in the most cost-effective way possible. Users are no longer limited to their own infrastructure. Federated storage allows truly unlimited data.

I’m a developer on Openstack’s object storage system, and I believe that Openstack is uniquely positioned to achieve this vision.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The thoughts expressed here are my own and do not necessarily represent those of my employer.