Programmer Thoughts

Thoughts after the OpenStack Juno Summit

2014-05-26T00:00:00+00:00

A week out of the OpenStack Juno summit, it’s time to reflect, focus, and move forward in the community. I’ve been involved in OpenStack since it began—I was on the Cloud Files team at Rackspace that originally wrote Swift—and I’ve seen the community and ecosystem grow dramatically over the last four years. What began as a group of about 20 has now become thousands.

There is no end to articles by tech journalists and pundits declaring that OpenStack will be successful, if only XYZ, where the blank is filled in by something that the author really likes. Of course these articles lead to response articles, blogs, and twitter snipes that basically say, “Yes, but ABC”. And nobody is all wrong, nobody is all right, and nobody really understands all the moving parts. All the voices put together leave me with two distinct impressions. First, OpenStack has hundreds of voices pulling it in all directions. Yet in spite of all the discordant voices (or perhaps because of them), there is a great sense of progress. Second, OpenStack is an idea that really captures the imagination, and what we’ve seen so far is just the beginning.

As someone who has been to more OpenStack events than most, this OpenStack summit felt different. We’ve marked a turning point in the community. Ryan Floyd (disclaimer: Ryan is an investor in SwiftStack, my employer) recently wrote about this turning point as the end of the beginning. Others have written that it is the beginning of the end. See? Discordant voices.

I tend to agree with Ryan’s view. When OpenStack started, it consisted of two distinct groups: developers and biz devs. This makeup allowed for technical progress and corporate partnerships between unlikely allies, both of which we’ve seen many times in the last four years. One group missing from the original makeup of OpenStack is product managers. As such, we’ve never seen a cohesive “single product” focus within the community.

There has been a vacuum in the OpenStack community of actual products around OpenStack. This OpenStack summit clearly demonstrated to me that the product gap is being filled by the community, and that gives me a lot of hope for the future. This is why I agree with Ryan that OpenStack is moving from the “beginning” and into something more substantial.

So I’m hopeful. OpenStack has captured the imagination of thousands. Four years on, it looks like our hard work may someday soon be an overnight success. As a community we’ve got a lot to be proud of.

But we need to be cautious too. Our “success” thus far doesn’t mean we’ll continue to be successful. We need to be very cognizant of the fact that we are still a very small community. We can’t be enamored by our own success in our own small bubble. I believe OpenStack is on the leading edge of the industry, but that implies there are a lot of people who still don’t know what OpenStack is or how it can benefit them.

We’ve got some serious challenges to overcome in the months and years ahead. From a technology perspective, we need to figure out how to deliver quality software that provides value beyond “glue layer to proprietary system”.

We also need to look to the health of the contributor community and figure out how to lower barriers to participating. We need to figure out how to accept patches from one-time contributors, and we need to figure out how to incorporate feedback from non-developers.

On the community side, we’ve got a massive amount of education to do. We’ve got a ton of work to do in making OpenStack projects easy to install and maintain. We’ve got to figure out how to keep OpenStack a “thing” that means something when end-users come across it.

We’re making improvements on all of these, but we’ve got more to do. We can’t get comfortable with where we are. We must remember that OpenStack is only useful when it’s actually deployed and running in production, therefore we must always keep deployers in mind.

I’m excited to see how the community has grown, and I’m excited by the potential of the future.

OpenStack Swift on Raspberry Pi

2013-03-24T00:00:00+00:00

Every attendee at PyCon 2013 got a free Raspberry Pi. So, naturally, the first thing I did was set up OpenStack Swift on it.

With thanks to North Coast Brewing for Old Rasputin and Thumbtack for the beer mug, I give you a single annotated script you can use to install OpenStack Swift, the large scale distributed storage engine, onto a low-powered credit-card-sized computer.

The current version of this script can be found in my github account.

    #!/bin/bash

    # This annotated script sets up a limited deployment of OpenStack Swift
    # onto a Raspberry Pi. It sets up a one-replica, one-server environment
    # appropriate for external testing. It assumes there is a user called "pi"
    # and that user has sudo access (this is the default on a Raspberry Pi).


    set -e

    # install requirements
    # I assume you've already done an `apt-get update && apt-get upgrade`

    sudo apt-get install python-software-properties curl gcc git memcached \
        python-coverage python-dev python-nose python-setuptools \
        python-simplejson python-xattr sqlite3 xfsprogs python-eventlet \
        python-greenlet python-pastedeploy python-netifaces python-pip \
        python-sphinx
    sudo pip install mock tox dnspython


    # build loopback drive
    sudo mkdir -p /srv
    sudo truncate -s 1GB /srv/swift-disk
    sudo mkfs.xfs -f -i size=512 /srv/swift-disk

    # update /etc/fstab
    grep '/srv/swift-disk' /etc/fstab
    if [ $? = 1 ]; then
    sudo tee -a /etc/fstab >/dev/null <<EOF

    /srv/swift-disk /mnt/sdb1 xfs loop,noatime,nodiratime,nobarrier,inode64,logbufs=8 0 0
    EOF
    fi

    sudo mkdir -p /mnt/sdb1/1

    sudo chown -R pi:pi /mnt/sdb1/1
    sudo ln -fs /mnt/sdb1/1 /srv/1
    sudo chown -R pi:pi /etc/swift /srv/1/ /var/run/swift

    # update /etc/rc.local
    grep 'su - pi /home/pi/bin/startmain' /etc/rc.local
    if [ $? = 1 ]; then
    sudo tee -a /etc/rc.local >/dev/null <<EOF

    mkdir -p /var/cache/swift
    chown pi:pi /var/cache/swift*
    mkdir -p /var/run/swift
    chown pi:pi /var/run/swift
    su - pi /home/pi/bin/startmain
    EOF
    fi

    sudo tee /etc/rsyncd.conf >/dev/null <<EOF
    uid = pi
    gid = pi
    log file = /var/log/rsyncd.log
    pid file = /var/run/rsyncd.pid
    address = 127.0.0.1

    [account6012]
    max connections = 25
    path = /srv/1/node/
    read only = false
    lock file = /var/lock/account6012.lock


    [container6011]
    max connections = 25
    path = /srv/1/node/
    read only = false
    lock file = /var/lock/container6011.lock

    [object6010]
    max connections = 25
    path = /srv/1/node/
    read only = false
    lock file = /var/lock/object6010.lock
    EOF

    sudo tee /etc/rsyslog.d/10-swift.conf >/dev/null <<EOF
    # Uncomment the following to have a log containing all logs together
    local1,local2,local3,local4,local5.*   /var/log/swift/all.log

    # Uncomment the following to have hourly proxy logs for stats processing
    $template HourlyProxyLog,"/var/log/swift/hourly/%\$YEAR%%\$MONTH%%\$DAY%%\$HOUR%"
    local1.*;local1.!notice ?HourlyProxyLog

    local1.*;local1.!notice /var/log/swift/proxy.log
    local1.notice           /var/log/swift/proxy.error
    local1.*                ~

    local2.*;local2.!notice /var/log/swift/storage1.log
    local2.notice           /var/log/swift/storage1.error
    local2.*                ~
    EOF

    sudo mkdir -p /var/log/swift/hourly
    sudo chmod -R g+w /var/log/swift



    set +e
    cd && git clone git://github.com/openstack/python-swiftclient.git
    set -e
    cd ~/python-swiftclient; git pull origin master && sudo python ./setup.py develop

    set +e
    cd && git clone git://github.com/openstack/swift.git
    set -e
    cd ~/swift; git pull origin master && sudo python ./setup.py develop

    cd && mkdir -p ~/bin

    sudo mkdir -p /etc/swift
    sudo chown pi:pi /etc/swift


    cat >/etc/swift/proxy-server.conf <<EOF
    [DEFAULT]
    bind_port = 8080
    user = pi
    log_facility = LOG_LOCAL1
    log_level = DEBUG
    eventlet_debug = true

    [pipeline:main]
    pipeline = catch_errors healthcheck proxy-logging cache slo ratelimit tempurl formpost tempauth staticweb container-quotas account-quotas proxy-logging proxy-server

    [app:proxy-server]
    use = egg:swift#proxy
    allow_account_management = true
    account_autocreate = true

    [filter:tempauth]
    use = egg:swift#tempauth
    user_admin_admin = admin .admin .reseller_admin
    user_test_tester = testing .admin
    user_test2_tester2 = testing2 .admin
    user_test4_tester4 = testing4 .admin
    user_test_tester3 = testing3
    user_demo_demo = demo .admin

    [filter:catch_errors]
    use = egg:swift#catch_errors

    [filter:healthcheck]
    use = egg:swift#healthcheck

    [filter:cache]
    use = egg:swift#memcache

    [filter:proxy-logging]
    use = egg:swift#proxy_logging

    [filter:ratelimit]
    use = egg:swift#ratelimit

    [filter:domain_remap]
    use = egg:swift#domain_remap

    [filter:cname_lookup]
    # Note: this middleware requires python-dnspython
    use = egg:swift#cname_lookup

    [filter:staticweb]
    use = egg:swift#staticweb

    [filter:formpost]
    use = egg:swift#formpost

    [filter:list-endpoints]
    use = egg:swift#list_endpoints

    [filter:bulk]
    use = egg:swift#bulk

    [filter:container-quotas]
    use = egg:swift#container_quotas

    [filter:account-quotas]
    use = egg:swift#account_quotas

    [filter:slo]
    use = egg:swift#slo

    [filter:tempurl]
    use = egg:swift#tempurl

    [filter:formpost]
    use = egg:swift#formpost
    EOF

    cat >/etc/swift/account-server.conf <<EOF
    [DEFAULT]
    devices = /srv/1/node/
    bind_port = 6012
    user = pi
    log_facility = LOG_LOCAL2
    recon_cache_path = /var/cache/swift
    eventlet_debug = true
    log_level = DEBUG
    mount_check = false
    disable_fallocate = true

    [pipeline:main]
    pipeline = recon account-server

    [app:account-server]
    use = egg:swift#account

    [filter:recon]
    use = egg:swift#recon

    [account-replicator]
    vm_test_mode = yes

    [account-auditor]

    [account-reaper]
    EOF

    cat >/etc/swift/container-server.conf <<EOF
    [DEFAULT]
    devices = /srv/1/node/
    bind_port = 6011
    user = pi
    log_facility = LOG_LOCAL2
    recon_cache_path = /var/cache/swift
    eventlet_debug = true
    log_level = DEBUG
    mount_check = false
    disable_fallocate = true

    [pipeline:main]
    pipeline = recon container-server

    [app:container-server]
    use = egg:swift#container

    [filter:recon]
    use = egg:swift#recon

    [container-replicator]
    vm_test_mode = yes

    [container-updater]

    [container-auditor]

    [container-sync]
    EOF

    cat >/etc/swift/object-server.conf <<EOF
    [DEFAULT]
    devices = /srv/1/node/
    bind_port = 6010
    user = pi
    log_facility = LOG_LOCAL2
    recon_cache_path = /var/cache/swift
    eventlet_debug = true
    log_level = DEBUG
    mount_check = false
    disable_fallocate = true

    [pipeline:main]
    pipeline = recon object-server

    [app:object-server]
    use = egg:swift#object

    [filter:recon]
    use = egg:swift#recon

    [object-replicator]
    vm_test_mode = yes

    [object-updater]

    [object-auditor]
    EOF

    # when setting up the hash_path_suffix, it is important to make it unique
    # and keep it a secret
    SUFF=`python -c 'import uuid; print uuid.uuid4().hex'`
    cat <<EOF >/etc/swift/swift.conf
    [swift-hash]
    swift_hash_path_suffix = $SUFF

    [swift-constraints]
    #max_file_size = 5368709122
    # Note: Since the Raspberry Pi has such limited storage space,
    # the maximum size of a single object has been set to 500MB.
    max_file_size = 524288000
    #max_meta_name_length = 128
    #max_meta_value_length = 256
    #max_meta_count = 90
    #max_meta_overall_size = 4096
    #max_object_name_length = 1024
    #container_listing_limit = 10000
    #account_listing_limit = 10000
    #max_account_name_length = 256
    #max_container_name_length = 256
    EOF


    cat <<EOF >/home/pi/bin/remakerings
    #!/bin/bash

    cd /etc/swift

    rm -f *.builder *.ring.gz backups/*.builder backups/*.ring.gz

    swift-ring-builder object.builder create 8 1 0
    swift-ring-builder object.builder add r1z1-127.0.0.1:6010/d1 1
    swift-ring-builder object.builder rebalance
    swift-ring-builder container.builder create 8 1 0
    swift-ring-builder container.builder add r1z1-127.0.0.1:6011/d1 1
    swift-ring-builder container.builder rebalance
    swift-ring-builder account.builder create 8 1 0
    swift-ring-builder account.builder add r1z1-127.0.0.1:6012/d1 1
    swift-ring-builder account.builder rebalance
    EOF

    cat <<EOF >/home/pi/bin/resetswift
    #!/bin/bash

    swift-init all stop

    sudo umount /srv/swift-disk
    sudo mkdir -p /srv
    sudo truncate -s 1GB /srv/swift-disk
    sudo mkfs.xfs -f -i size=512 /srv/swift-disk

    sudo mount -a
    sudo mkdir -p /mnt/sdb1/1
    sudo chown -R pi:pi /mnt/sdb1/*

    sudo rm -rf /var/log/swift
    sudo mkdir -p /var/log/swift/hourly

    sudo mkdir /var/cache/swift
    sudo chown -R pi:pi /var/cache/swift

    find /var/cache/swift* -type f -name *.recon -exec rm -f {} \;

    sudo service rsyslog restart
    sudo service memcached restart
    EOF

    cat <<EOF >/home/pi/bin/startmain
    #!/bin/bash

    if [ ! -d /var/run/swift ]; then
      sudo mkdir -p /var/run/swift
      sudo chown -R pi:pi /var/run/swift
    fi

    swift-init main start
    EOF

    chmod +x /home/pi/bin/*

    cat <<EOF

    ===========================================

    Install completed.

    You can now call \`resetswift\` and \`startmain\` to clean everything and start
    the Swift server processes.

    To test, try the following:
    export PIIP=<IP address of your Raspberry Pi>
    curl -i -H "X-Auth-User: test:tester" -H "X-Auth-Key: testing" \\
       http://\${PIIP}:8080/auth/v1.0/
    EOF

This was pretty fun for me. I hope you liked it too.

Swift Tech Overview

2012-04-22T00:00:00+00:00

Openstack Object storage, called swift, is a distributed, fault-tolerant, eventually consistent object storage system. In this post, I’d like to go in to some detail about what that means.

Distributed

Swift is a distributed system. It is designed to be run on a cluster of computers rather than on a single machine. Swift is composed of three major parts: the proxy, storage servers, and consistency servers.

Proxy

The proxy server is a server process that provides the swift API. As the only system in the swift cluster that communicates with clients, the proxy is responsible for coordinating with the storage servers and replying to the client with appropriate messages. The proxy is an HTTP server that implements swift’s REST-ful API. All messages to and from the proxy use standard HTTP verbs and response codes. This allows developers building clients to interact with swift in a simple, familiar way.

Swift provides data durability by writing multiple complete replicas of the data stored in the system. The proxy is what coordinates the read and write requests from clients and implements the read and write guarantees of the system. When a client sends a write request, the proxy ensures that the object has been successfully written to disk on the storage nodes before responding with a code indicating success.

Storage Servers

The swift storage servers provide the on-disk storage for the cluster. There are three types of storage servers in swift: account, container, and object. Each of these servers provide an internal REST-ful API. The account and container servers provide namespace partitioning and listing functionality. They are implemented as sqlite databases on disk, and like all entities in swift, they are replicated to multiple availability zones within the swift cluster.

Swift is designed for multi-tenancy. Users are generally given access to a single swift account within a cluster, and they have complete control over that unique namespace. The account server implements this functionality. Users can set metadata on their account, and swift aggregates usage information here. Additionally, the account server provides a listing of the containers within an account.

Swift users may segment their namespace into individual containers. Although containers cannot be nested, they are conceptually similar to directories or folders in a file system. Like accounts, users may set metadata on individual containers, and containers provide a listing of each object they contain. There is no limit to the number of containers that a user may create within a swift account, and the containers do not have globally-unique naming requirements.

Object servers provide the on-disk storage for objects stored within swift. Each object in swift is stored as a single file on disk, and object metadata is stored in the file’s extended attributes. This simple design allows the object’s data and metadata to be stored together and replicated as a single unit.

Consistency Servers

Storing data on disk and providing a REST-ful API to it is not a hard problem to solve. The hard part is handling failures. Swift’s consistency servers are responsible for finding and correcting errors caused by both data corruption and hardware failures.

Auditors run in the background on every swift server and continually scan the disks to ensure that the data stored on disk has not suffered any bit-rot or file system corruption. If an error is found, the corrupted object is moved to a quarantine area, and replication is responsible for replacing the data with a known good copy.

Updaters ensure that account and container listings are correct. The object updater is responsible for keeping the object listings in the containers correct, and the container updaters are responsible for keeping the account listings up-to-date. Additionally, the object updater updates the object count and bytes used in the container metadata, and the container updater updates the object count, container count, and bytes used in the account metadata.

Replicators ensure that the data stored in the cluster is where is should be and that enough copies of the data exist in the system. Generally, the replicators are responsible for repairing any corruption or degraded durability in the cluster.

Fault-tolerant

The combination of swift’s pieces allows a swift cluster to be highly fault-tolerant. Swift implements the concept of availability zones within a single geographic region, and data can be written to hand-off nodes if primary nodes are not available. This allows swift to survive hardware failures up to and including the loss of an entire availability zone with no impact to the end-user.

An interesting consequence of this design is that upgrades and cluster resizes can be easily performed on a production cluster with zero end-user downtime. Swift provides both forward and backwards compatibility of its API, so a swift cluster can be running multiple versions of the swift software at the same time, as is common while the software is being upgraded. Similarly, during resizes, the incongruent data about where data lives is simply seen as a failure. Processes like replication ensure that the data will be moved to its correct location.

Eventually Consistent

Swift achieves high scalability by relaxing constraints on consistency. While swift provides read-your-writes consistency for new objects, listings and aggregate metadata (like usage information) may not be immediately accurate. Similarly, reading an object that has been overwritten with new data may return an older version of the object data. However, swift provides the ability for the client to request the most up-to-date version at the cost of request latency.

Example Request Flow

When an object PUT request is made to swift, the proxy server determines the correct storage nodes responsible for the data (based on a hash of the object name) and sends the object data to those object servers concurrently. If one of the primary storage nodes is unavailable, the proxy will choose an appropriate hand-off node to write data to. If a majority of the object servers respond with a success, then the proxy returns success to the client.

Similarly, when an object GET request is made, the proxy determines which three storage nodes have the data and then requests the data from each node in turn. The proxy will return the object data from the first storage node to respond successfully.

Client Data Designs

Using any storage system effectively means understanding the characteristics of the system and the guarantees that the system provides. Swift is optimized for high concurrency rather than single-stream throughput. The aggregate throughput of a swift cluster is much higher than what is available for a single request stream. A swift client can take advantage of this by distributing data across multiple containers within an account. For example, backups may be stored by day or week in a container that includes that information in its name. Or a photo-sharing application may store images across many containers by using a prefix of the hash of the photo in the container names.

Summary

Swift’s design provides robust software that can run effectively on unreliable (read: cheap) hardware. Modular processes allow deployers to optimize clusters based on client use cases. Fault-tolerance allows clusters to be effectively managed by a limited operations staff.

Swift is production-ready code that has been running at scale powering Rackspace Cloud Files for two years. It is being deployed around the world at large and small scale by public cloud service providers and for private, internal needs. Swift is 100% open source released under the Apache 2.0 license. For more information, you can read the technical docs, the admin guide, or the API guide. To get started building applications for swift, you can use either the stand-alone Python module included in swift’s code or any of Rackspace’s Cloud Files language bindings. If you have further questions, ask on the Openstack mailing list or in #openstack on freenode.

Swift State of the Project Spring 2012

2012-04-05T00:00:00+00:00

The last six months of OpenStack swift development have been the most active six-month period for the project since the code was first put into production. The developer community has grown, the code has improved, and adoption has increased. The past six months have covered the Openstack “Essex” release cycle. During this time, swift has made five releases: 1.4.4 through 1.4.8.

Where We Are

The easiest way to get an overview of swift’s evolution is to look at the version control logs.

Swift has had 125 non-merge commits:

git shortlog -nes --no-merges 1.4.3..1.4.8 | awk '{SUM+=$1} END {print SUM}'

Greg Holt has been the most prolific commiter:

git shortlog -nes --no-merges 1.4.3..1.4.8 | head -1

Swift has had contributions from Rackspace, SDSC, RedHat, Nebula, HP, SwiftStack, Internap, Memset, CERN and others.

The three largest commits in the last six months have been for the formpost middleware, man pages, and the expiring objects feature:

formpost 7fc1721d7d5290a6af278f9b6844cd3b96b7c7c3
    (11 files changed, 3359 insertions(+), 16 deletions(-))
man pages 0b0785e984d9164c1d1cd84f05dd9909bb7d37a8
    (27 files changed, 3148 insertions(+), 0 deletions(-))
expiring objects 872420efdb8e6e945cd2fe06994136b8c2ee153a
    (20 files changed, 2043 insertions(+), 53 deletions(-))

But looking at VCS logs doesn’t tell the whole story. What is in these commits?

Several important new features have been added to swift. Swift now supports expiring objects, HTML form POSTs with temporary signed URLs, and the Openstack auth 2.0 API in the swift CLI. Other new features include new config options, optional functionality in middleware, and more ops tools.

Expiring objects allow a swift user to set an expiry time or a TTL on an object, after which the object is no longer accessible and will be deleted from the system. This feature enables new use cases for swift. For example, this feature could be used by a document managements system with data retention requirements.

The new formpost and tempurl middleware modules allow a swift user to create a URL with write access and then use that URL as the target of an HTML form POST. This feature is aimed at a control panel use case. Since swift uses an auth method based on information in request headers, browsers typically can’t access swift directly. With these two new middleware modules, someone building a swift control panel can have the browser directly upload content into the swift cluster. Since the requests are going directly to swift and don’t have to be proxied through the control panel web servers for auth, the control panel deployer only has to scale infrastructure based on the control panel usage, not swift usage.

In addition to new features, many bugs have been squashed as well. Swift developers have found and fixed memory leaks, improved data corruption detection, improved replication, and improved the way rings are built.

Swift’s documentation has also been greatly improved in the last six months. Thanks to Marcelo Martins, an ops engineer at Rackspace, swift now has a full set of man pages. Additionally, swift’s self-auditing tool (swift-recon) now has full documentation.

Beyond the code, swift’s community has grown quite a bit. In addition to many private deployments, several companies have announced public deployments or their internal usage of swift. Softlayer, Haylix, and Aptira have all announced public clouds that use swift. Wikimedia Foundation has announced that all thumbnails on wikipedia are now served from a swift cluster, and they are migrating all of their media files to a swift back end.

Swift now has fifty-nine contributors listed in the AUTHORS file. Twenty-seven have been added in the last six months. This is incredible growth (nearly 50%), and many of these new contributors come from companies that had not previously contributed to swift. This growth speaks to the increasing rate of adoption of swift and builds a strong developer base that will ensure swift’s success in the furture.

Where We’re Going

However, swift is by no means “finished” or “complete”. There are always bugs to fix and edge cases that can be handled better. There are new features and use cases that can and should be solved. Some examples include solving multi-site deployments and keeping very large containers performant. Both of these improvements will allow swift to grow beyond its current use case, but they involve tremendous complexity to implement well. It is unlikely that serious attempts to solve these issues will be attempted until they become pain points for swift deployers. As one of the swift developers said, “Swift has solved all the easy problems. All we have left are the really hard problems.”

The biggest challenges facing swift are not technical; they are about the developer community. Expect the swift community to continue to grow. More companies are deploying swift. More developers will be contributing to swift. A larger developer community will of course bring new challenges, but much can be learned from other Openstack projects like nova. Bringing more developers to swift will allow swift to become more robust and more adaptable to a wider variety of use cases.

The next six months for swift should bring more community education and a larger ecosystem. More companies will deploy swift, and their unique experiences will allow swift to become more robust and feature-filled. Swift’s future is bright as both public and private clouds continue to grow.

Storage is important. Everyone has data, and it’s always growing. You should have ownership of everything that touches your data. OpenStack gives you that power.

Democratization of Data

2011-10-28T00:00:00+00:00

You should have control over your data. If you want to host your own data, you should be able to. If you want to pay someone else to host your data, you should be able to interact with their systems in the same way you interact with your local system. You should be able to change hosting providers without changing your interface. You should be able to pull your data from a hosting provider to host it locally. You should be able to host your data across many providers seamlessly. You should be able to move your compute needs to your data storage. You should be able to separate your concerns over distributing your data from who is distributing your data.

You can do three things with data: store it, compute it, and deliver it. You should be in control of where you data is stored, how you compute on it, and how you distribute it. This is the promise of Openstack: a common infrastructure that puts you in control of your data. This promise is the democratization of data.

Ultimately, the democratization of data comes down to storage. Openstack provides a high-quality object storage system called swift. Swift is ideal for unstructured data that can grow without bound. Backups and static web content are perfect examples of good use cases for swift.

Computing on your data is provided by another Openstack project: nova. Nova enables the management of large numbers of dynamic virtual machines. It is directly comparable to AWS EC2 and Rackspace Cloud Servers.

Openstack currently lacks integration with content delivery networks. Such integration should facilitate simple distribution and management of data with exiting CDN providers (or perhaps even allow you to run your own CDN).

These three pillars, along with various complementary projects like identity management, queueing, and block storage, provide a foundation upon which we can build the future.

Swift State of the Project Fall 2011

2011-10-13T00:00:00+00:00

Swift has been running in production at Rackspace for over a year with near 100% uptime. Rackspace’s swift clusters store billions of objects and petabytes of data. Several companies, including Internap, KT, SDSC, HP, and others, have also deployed and are running swift clusters in production. These clusters range in size from fairly small to several petabytes. Other organizations, including CERN, are evaluating swift for production use. Overall, swift is a success.

Swift’s active developers are all currently Rackspace employees. Other companies have talked about features and promised to contribute code, but so far, no major patches have been forthcoming. Unfortunately this means that the day-in day-out needs for swift come from Rackspace. While many needs can be anticipated and met simply by looking at Rackspace’s needs, some key areas of development are missed. For example, Rackspace does not often deploy new swift clusters, therefore automatic deployment tools are neglected. Similarly, Rackspace’s needs focus on a general use case for a broad set of customers. Other clusters may have more specific needs based on different use cases. Until swift developers see these other use cases, swift is not likely to optimize for them.

Although other companies are not currently contributing swift code, many companies are active in the community. Piston, Nebula, Voxel, HP, and others are actively engaging the developer and user communities. They sponsor biweekly meetups, engage on the mailing lists, contribute in IRC, participate in the design summits, and generally talk about what they are doing and what needs to be done. For this, I am grateful. It seems that swift currently meets the needs of these groups. I hope that as they grow and use swift more, they will see areas to improve the software and contribute those improvements back to the community.

As we move forward with swift development, certain fundamental things must be preserved, protected, and encouraged. We must maintain a healthy project. We must ensure good feedback channels with users. We must encourage other companies to continue to participate and even submit patches. We must do what we can to encourage and support an active ecosystem of tools for swift. The universe of end-user tools, automation software, and monitoring systems all factor in to a decision to use swift or not. If we fail in these fundamental areas, we might as well pack up and go do something else.

With these concerns in mind, I see three realms of future swift development. Realm one is improving swift by fixing bugs and adding features. Realm two of swift development is data-compute locality. Most (if not all) data processing tasks can be improved by reducing the latency between where the data is stored and where it is processed. Realm three moves beyond data-compute locality by a single swift deployer and solves data federation.

We are currently working on realm one: improve swift by fixing bugs and adding features. The main goals are around very large and very small clusters. This is generally an ongoing task, and even when large and small deployments are better served, there will always be bug fixes and smaller feature improvements. Some features will be large, and some will be small. The work here mostly focuses on filling out the feature gaps in swift for specific use cases.

Realm two is waiting on nova stability. After large nova clusters are running in production, we can start to explore what it will take to unify the clusters. The goal is to bring compute to “near” the data in a network sense. The closest “near” can be is local to the same server, but it could perhaps be more simply solved by only being in the same cabinet or availability zone. Since data is “sticky” and hard to move, oftentimes bringing the compute to the data is more realistic. I do not foresee swift ever merging with nova; rather, I would like to see swift and nova cooperate in such a way that swift’s ring can be used as a scheduler for nova VMs. Currently nova is in a state of flux and needs to focus on maturity before large problems like swift integration are tackled. I expect nova-swift integration to be on hold for about another 12 months while nova matures.

Realm three is the ultimate goal. Federating compute is a fairly simple concept to understand. “Bursting into the cloud” is common enough to have become a marketing phrase. Federating storage still needs to be defined even before it can be understood. I believe it involves datasets distributed and replicated across many storage providers and dynamically balancing access to them. This is something I’ve talked about in a previous blog post.

Solving these problems will take a lot of work and a lot of time. As we move from one realm to the next, we must not consider work to be “done” in the previous realms. We must always listen to feedback and continue to polish the system as a whole.

Swift has been actively developed for a little over two years now. It was revealed to the world about one year ago and has made tremendous progress since. I’m quite proud to have been a part of the project. We have all learned a lot and had a lot of fun. Swift is in a great place: openstack momentum is growing, more users are deploying swift, and the vast majority of the feedback we hear is positive. Swift’s first two years have been a success. As we remember the fundamental things and work together as part of an active community, swift’s future will be even brighter than its past.

Setting Permissions (ACLs) on Openstack Swift containers

2011-08-04T00:00:00+00:00

I frequently see people in the #openstack IRC channel on freenode asking about how to set up ACLs in swift. Here’s a short tutorial.

First, set up two accounts. How to do this is specific to your auth system. Fir this example, I’ll use the default tempauth that ships with swift.

In your proxy server config file, under the tempauth section, add the accounts:

user_test_tester = testing .admin
user_test_tester2 = testing2

The first user (“tester”) has admin privilages on the account (“.admin”). The second user (“tester2”) is in the test account, but will only have access to what the first user grants him. The two accounts don’t need to have the same tempauth account (the “test” part).

Auth the first user and create a container. Then an read permissions on that container for the second user:

$ curl -i -H "X-Auth-User: test:tester" -H "X-Auth-Key: testing" \
    http://swift/auth/v1.0
$ curl -i -XPUT -H "X-Auth-Token: token1" http://swift/v1/AUTH_test/container
$ curl -i -XPOST -H "X-Auth-Token: token1" -H "X-Container-Read: test:tester2" \
    http://swift/v1/AUTH_test/container

Note that in the last curl command the proper value for the ACL is <account>:<user>.

Now, auth the second account. Note that the second account cannot list the containers or do anything but read what’s in the container called “container”.

$ curl -i -H "X-Auth-User: test:tester2" -H "X-Auth-Key: testing2" \
    http://swift/auth/v1.0
$ curl -i -H "X-Auth-Token: token2" http://swift/v1/AUTH_test/
$ curl -i -H "X-Auth-Token: token2" http://swift/v1/AUTH_test/container/
$ curl -i -XPUT --data-binary 1234 -H "X-Auth-Token: token2" \
    http://swift/v1/AUTH_test/container/foo

If one desires, adding the X-Container-Write header to a container will similarly grant write access.

A Vision of the Future of Storage

2011-05-22T00:00:00+00:00

Businesses are beginning to embrace “The Cloud” as a cost-effective way to offload internal compute and storage needs. However, many businesses still keep data in-house. Unfortunately, most of this data is useful when one can compute on it, and transfer costs to move the data to the compute centers can be cost-prohibitive. This implies that by storing data in-house, businesses also must host compute resources in-house.

Therefore, in an effort to save money, businesses have searched for ways to “burst into the cloud”–a marketing phrase which simply means temporarily using someone else’s public compute infrastructure to do some work and only paying for what is used. Unfortunately, this presents difficulty when it comes to storage. The data lives in-house. “Bursting” doesn’t make sense for storage.

There is an answer: federated storage.

Federated storage is where the storage cluster becomes a network of storage clusters. Upload data to a storage endpoint. Metadata on your data provides rules as to where the data should be stored. For example, rules can be set to require that the data is stored in geographically distinct regions. Other rules can be set to require that data is stored with at least two companies. This gives you both geographic and business redundancy. If an earthquake knocks out a data center or a company goes out of business, the data is protected.

Storage providers can enter into peering relationships that allow them to offer a storage network to their customers. For example, imagine that Dell, Rackspace, Internap, and ATT all have storage clusters, each with their own customers. Each company has different capacity and locations available. If these companies entered in to a federated storage agreement, they could share customers, capacity, and geographic redundancy.

This federated storage network is similar to how airlines organize today. American Airlines and British Airlines service different parts of the world, but they are both part of the One World network that allows passengers to collect rewards with one company even when using the partner airlines.

In the same way, smaller companies like Internap and Rackspace can partner with larger companies like ATT and Dell. A Rackspace customer can choose to store data both in one of Rackspace’s data centers and also in geographic regions that Rackspace does not serve. Perhaps Dell builds a data center in South Asia. Rackspace customers would then be able to store data redundantly in both Rackspace’s data center in Chicago and also in Dell’s data center on the other side of the world.

Other storage federations could also arise. Users then are able to choose where to store their data based on the entire offering of the partnership of providers rather than on one company’s merits. Companies benefit by having access to infrastructure without the need for expensive cap-ex costs. Customers benefit by having more storage options.

As another benefit to customers and businesses alike, a storage federation also implies that data is portable. Although data is sticky, customers would be able to move their data from one company or federation to another using the same API that the companies within the federation use to manage the federation itself. At a provider level, storage could be temporarily moved to another system for things like planned downtime or unplanned outages.

However, if data is portable between companies, this implies that the companies must compete on something other than simply having the infrastructure. A smaller company could join a federation and suddenly allow their customers access to a previously unavailable network of storage. Companies would need to differentiate on service, user interface, price, or myriad other things other than the availability of the storage itself. Of course, these federations would be governed by contracts established by the cooperating companies, and, so, the specifics of any one federation would be unique.

Federated storage networks become very interesting when the associated compute resources are accounted for. Rules for storage could take into account the available compute resources in particular locations or with particular providers. Using the companies from the example above, ATT may provide a large number of available compute resources, but Dell may provide particular types of compute–GPU vector processing, for example. A Rackspace customer, then, can choose to make their backups of their Rackspace-hosted web application stored in Rackspace’s storage infrastructure but the generated data could be stored with a rule requiring a copy be made available in ATT’s storage network near ATT’s large availability of compute resources. Or, perhaps a Dell customer could require that data be made available only within Internap’s storage infrastructure because of some particular government regulation Internap meets.

Federated storage allows data to break free of a single company. Customers gain more control over their data by being able to choose where their data is stored. Businesses gain the ability to store and process their data in the most cost-effective way possible. Users are no longer limited to their own infrastructure. Federated storage allows truly unlimited data.

I’m a developer on Openstack’s object storage system, and I believe that Openstack is uniquely positioned to achieve this vision.

The Story of an Openstack Feature

2010-12-10T00:00:00+00:00

Openstack is a fairly large open-source project with a set of core developers. Anyone can submit patches for bugfixes or new features, but sometimes the process can be a little mysterious, especially for larger features or for developers that haven’t contributed to open-source projects before.

For the swift project (Openstack storage), we have a mature codebase running in a production environment. Any patches that are accepted must not have adverse effects for the scalability or performance of the system a whole.

One of the features currently being developed for swift is large object support. The feature has gone through many iterations in both design and code, but perhaps the most important development came at the Bexar design summit. As the developers on the project, we knew that files larger than 5GB were important, but we did not have a good use case. We did not want to develop a solution for large files that did not meet the needs of user who would actually want the feature. At the design summit, we were able to talk with Openstack users at NASA who had specific uses in mind for large objects. A launchpad blueprint was developed, and the existing coding work was refocused to meet the needs of the users.

Update: The large object feature has now been approved and lives in the swift trunk.

There are several ways to implement large object support. First, and most simply, is to raise the object size limit constant. The constant determines how big a file can be, but relying on it has limits, and raising it has some nasty side effects. Since an object is stored on one physical drive (per replica), one can only raise the object limit constant to the size of the smallest drive in the cluster. If a cluster is filled with 2TB drives, this means that the largest object can only be 2TB¹. Additionally, since objects are spread evenly throughout the cluster, the balance of the fullness of each drive in the cluster is related to the ratio of the max object size to the size of the drives in the cluster. At scale, each drive in a swift cluster will differ from the average fullness of all drives in the cluster by an amount proportional to the max object size.

If simply raising the max object size constant won’t work, another way to support large objects is to split the object into chunks and tell the system to treat groups of objects as one large object. The naive implementation is to split the objects as they are streamed into the system. As an object is loading into swift, swift could split the object after a certain number of bytes and then write the next bytes to a different location in the cluster. This implementation has the advantage of not requiring the user to know anything about object chunks or having a final “commit” step to finalize the large object write. This implementation was actually written, but it was rejected for its enormous complexity and various failure condition edge cases. One of its biggest disadvantages is that it does not allow for users to upload parts of the large uploads in parallel. Asking a user to upload terabytes of data as one upload simply isn’t practical.

After talking to swift users at the Bexar design summit, especially those from NASA, we realized that a large object solution that is implemented with client-side chunking would be sufficient to meet the need of our users and offer some advantages without the disadvantages of the server-side chunking implementation.

A client-side chunking implementation of large objects requires users to upload the chunks of the large objects as normal objects, but with a unique prefix. For example, one could upload three 5GB files (obj/1, obj/2, and obj/3). Then the user creates a manifest object that defines the prefix of the objects (“obj/” in this example). Now, the user can upload or the chunks concurrently, but if the manifest file is downloaded, the system will stream the concatenated chunks to the client. This allows for great flexibility for the user and still allows the system to support very large objects. With the current proposed implementation, the only limit of large files is the size of the cluster itself. If the operators can deploy servers faster than the user can upload the data, the object size is truly unlimited. This is much better than the similar feature in S3 that was announced today. This feature in swift will allow a manifest file to be created for existing content. Additionally, a manifest can be created for a large object, and content can be added to that large object at a later time without updating the manifest. Possible applications beyond simply storing large single objects include streaming all data in a single container to a client as one large object, appending to objects, maintaining sym links to files, and an upload pseudo pause and resume.

This is a feature that will be included in the swift codebase very soon, and is something we are very excited about. We think it balances the needs of the system (scalability and performance) with the requirements of the users. This feature would not have been implemented nearly as well without input from the community. The conversations we had with people at the design summit were invaluable to the design of this feature.

As always, patches are welcome in swift. If you have bug fixes or an idea for new features, we welcome contributions. Talk to us; submit your code; give us your use cases–we want swift to be the best it can be. The swift code is hosted on Launchpad, and the developers can be found on IRC in #openstack on irc.freenode.net.

Community input in the Openstack project is vital. I’m excited about where the project has been, but even more excited to see where the community takes it in the future.

Actually, it would be less than 2TB. A swift cluster is full when the first hard drive in the cluster is full. Therefore, it is wise to limit the fullness of the drives to about 80% of their capacity.↩

Features I Would Like to See in Swift

2010-11-09T00:00:00+00:00

Swift is a great way to store large amounts of data cheaply. This week I’m at the OpenStack design summit, and I’ve been thinking of features I would like to see added to swift.

WebDAV support: WebDAV support would allow swift users to mount public containers as network drives in any modern operating system. It could probably be implemented as WSGI middleware, and would therefore be an optional feature for any swift deployment.
Auto-compression: I would like to see swift support dynamic compression and decompression of responses based on the Accept header in the request. This feature too could be implemented as WSGI middleware and be optional for any swift deployment. One concern, however, is the extra CPU cycles required for the compression and decompression. The swift proxy servers can see high CPU load under heavy traffic conditions.
Object versioning: A more complicated feature, object versioning would allow old objects to be accessed after newer data has overwritten them. New semantics for accessing old versions (or even enabling/disabling this feature) would have to be created, and many questions relating to failure scenarios would need to be answered.
stop_marker query parameter in listings: Currently, account and container GETs can be filtered with a marker query parameter. The marker parameter will cause the listing to return values that are greater than the marker. In a container of 100 items a marker equal to item 50 will return items 51 through 100. However, if one wants to only fetch items less than item 60, the listing has to be filtered by the client. A stop_marker query parameter would return anything less than or equal to the parameter value and would be able to be used in conjunction with the other query parameters. This would be a relatively simple feature to add without any obvious (to me) risks to the system as a whole. Update: This feature is now supported in swift: https://code.launchpad.net/~notmyname/swift/end_marker.

If anyone would like to add any of these features to swift, please grab the code and submit your patch.

Swift (OpenStack Object Storage) Overview

2010-11-06T00:00:00+00:00

What is it?

Swift is a highly scalable redundant unstructured data store designed to store large amounts of data cheaply. “Highly scalable”, means that it can scale to thousands of machines with tens of thousands of hard drives. Swift is designed to be horizontally scalable–there is no single point of failure. In most large-scale deployments, swift should become more performant as the cluster grows larger. In the CAP theorem, swift sacrifices C for A and P. Most operations happen synchronously, but consistency is sacrificed in failure scenarios.

“Redundant” means that swift stores multiple copies of each entity in the system. Each copy is stored in physically distinct availability zones, so common failures like hard drive failure network issues are highly unlikely to cause data loss or downtime.

“Unstructured data store” means that swift simply stores bits. Swift is not a database. Swift is not a block-level storage system. Swift stores blobs of data. Swift offers namespace groupings within accounts as containers, but no other relation between objects is stored.

For more information on the internal workings of swift, see http://swift.openstack.org/overview_architecture.html.

What can it do?

Although swift is a key-value store, it is optimized for highly available reads and writes. This makes it ideal for storing backups and static web content. Swift is well-suited to storing and serving server backups, VM snapshots, database backups, image libraries, scripts and stylesheets, or or any other static content that needs to be accessed frequently.

Also, because swift guarantees that objects will be available for reading as soon as they are successfully written, swift can be used to store content that changes frequently.

How does one use swift?

Swift has a ReST-ful API. All communication with swift is done over HTTP, using the HTTP verbs to signal the requested action. A swift storage URL looks like

swift.example.com/v1/account/container/object

Swift’s URLs have four basic parts. Using the example above, these parts are:

Base: swift.example.com/v1/
Account: account. An account is determined by the auth server when the account is created. The devauth server that ships with swift creates URLs of the form AUTH_uuid.
Container: container. Containers are namespaces used to group objects within an account
Object: object. Objects are where the actual data is stored in swift. Object names may contain /, so pseudo-nested directories are possible.

One may get a list of all containers in an account with a GET on the account: GET http://swift.example.com/v1/account/

One may create new containers with a PUT to the container: PUT http://swift.example.com/v1/account/new_container

One may list all object in a container with a GET on the container: GET http://swift.example.com/v1/account/container/

One may create new objects with a PUT on the object: PUT http://swift.example.com/v1/account/container/new_object

Additionally, one may use POST to change metadata on containers and objects.

Get it

Swift is completely open-source released under the Apache 2.0 license. Find it at http://swift.openstack.org. Current documentation is found at http://swift.openstack.org. Patches are welcome.

Server-side Object Copy in OpenStack storage

2010-07-24T00:00:00+00:00

OpenStack storage (codenamed swift supports server-side object copy.

Suppose you upload a file with the wrong object name or you needed to move some objects to another container. Without a server-side copy feature, you would need to reupload the same content and delete the existing object. With server-side object copy, you can save the step of re-uploading the content and thus also save the associated bandwidth charges, if any were to apply.

There are two ways to copy an existing object to another object in swift. One, do a PUT to the new object (the target) location, but add the “X-Copy-From” header to designate the source of the data. The header value should be the container and object name of the source object in the form of “/container/object”.

The second way to do an object copy is similar. This time, do a COPY to the existing object, and include the “Destination” header to specify the target of the copy. The header value is the container and new object name in the form or “/container/object”.

With both of these methods, the destination container must exist before attempting the copy.

If you were wanting to perform a move of the objects rather than a copy, you would need to send a DELETE request to the old object. A move simply becomes a COPY + DELETE.

All metadata is preserved during the object copy. Note that you can set metadata on the request to copy the object (either the PUT or the COPY) and the metadata will overwrite any conflicting keys on the target (new) object. One interesting use case is to copy an object to itself and set the content type to a new value. This is the only way to change the content type of an existing object.