Programmer Thoughts

By John Dickinson

Storage Systems Overview

February 19, 2012

Storage requirements are getting huge. Data is incredibly sticky: it doesn’t move or ever get smaller. These two realities make your choice of a storage system vitally important. However, there is no storage system that provides a single solution for all use cases. You must pair the right use case with the right system.

When the word “storage” is mentioned, some people think of different types of hardware like hard drives (both spinning media and solid state) and tape drives. Some think of databases. Some think of deployment patterns like NAS, DAS, or SAN. Others may think of specific vendors. NetApp, Isilon, and EMC all offer storage products.

Types of Storage

However, there are only three types of storage: block, file, and object. Each type offers their own advantages and has their own use cases.

Block Storage

Block storage gives you access to the “bare metal”. There is no concept of “files” at this level. There are just evenly sized blocks of data. Generally, using block storage offers the best performance, but it is quite low-level. Database servers often times can take advantage of block storage systems. An example of a common block storage system is a SAN.

File Storage

File storage provides access to a file system. This is the most familiar kind of storage–it’s what we interact with most on a daily basis. Users of file storage have access to files and can read and write to either the whole file or a part of it. File systems are what operating systems provide on all of our personal computers. In a shared environment, file storage is often seen as a network drive.

Object Storage

Object storage is probably the least familiar type of storage to most people. Object storage doesn’t provide access to raw blocks of data. It doesn’t offer file-based access. Object storage provides access to whole objects, or blobs of data and generally does so with an API specific to that system. Unlike file storage, object storage generally does not allow the ability to write to one part of a file. Objects must be updated as a whole unit. Three of the most common commercial object storage systems are Amazon’s S3, EMC’s Atmos, and Rackspace’s Cloud Files. Object storage excels at storing content that can grow without bound. Perfect use cases include backups, archiving, and static web content like images and scripts. One of the main advantages of object storage systems is their ability to reliably store a large amount of data at relatively low cost.

Each type of storage has advantages and disadvantages. Trade-offs come when you try to grow your storage or layer the storage abstractions. What happens when you try to grow your storage system beyond a few dozen terabytes? What about beyond a petabyte or beyond fifty petabytes? Systems at this scale must make trade-offs in some areas.

CAP Theorem

In computing, the CAP theorem states that distributed systems must choose two out of consistency, availability, and tolerating network failure. For example, a system can be consistent (ie all reads get the most current data) and handle network failures, but it must sacrifice availability to do so. Or a system can choose to handle network failures and have perfect availability, but it must sacrifice consistency to do so. Distributed systems must always handle network failures, so they must choose to sacrifice either availability or consistency.

Storage systems become distributed as they grow. Openstack Swift (the basis for Rackspace’s Cloud Files product) chooses to sacrifice consistency for availability and network failure tolerance. This choice allows the system to scale to enormous levels and provide massive uptime, but it also means that in certain scenarios, some data may not be updated throughout the entire system. For example, a container listing may not be up-to-date immediately after writing an object. Swift will queue the container listing update and allow the object write to succeed. This sort of consistency model is called “eventual consistency”.

Object Storage Use Cases

Since object storage is the least familiar to people, I’d like to review some use cases for these type of systems. The most common use cases are backups, archiving, and web content. These use cases are fairly straightforward and easy to understand.

One of the most exciting use cases for object storage is as the back end for a storage appliance. Since the storage appliance presents the object storage system as local or network storage, the distinction between on-site and off-premise storage goes away. Those using a storage appliance backed by an object storage system can start to think about “data to be stored” and not have to worry about where the data is stored or managing separate backups.

Document storage is another interesting use case for an object storage systems. At first glance, this seems to be the same use case as archiving, but document management systems will go beyond simple archiving and also add in policies for each document describing permissions and retention.

Similarly, an object storage system can be used for disaster recovery. Since a system like Swift can store large amounts of data cheaply and reliably, it makes sense for a business to store a copy of their data in a remote Swift cluster. Generally an object storage system works well for this use case since the DR data will be large and needs to be updated and restored using high concurrency.

Object storage is also good for storing large data sets. Scientific research gathers enormous amounts of data as humanity looks at the biggest and smallest things in our universe. This data is oftentimes impossible to replace and becomes the basis for further research for decades to come. Object storage systems can reliably store this information in a cost-effective manner.

Object storage systems can also be used as the basis for other storage systems. For example, an object storage system could be used as the basis for a block storage system. Each “block” in the block storage system would be represented as an object in the object storage system. Layering abstractions like this will allow your block system to grow to a very large scale, but it will have certain performance penalties. A block system build on top of an object system will have much higher latency than a traditional block storage system.

Choose the Right System

Block storage, file storage, and object storage each excel in different areas. Understanding the storage landscape and the advantages and costs of each type of storage allows savvy users to choose the right system for their use case.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The thoughts expressed here are my own and do not necessarily represent those of my employer.