<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Programmer Thoughts</title>
 <link href="/openstackposts.xml" rel="self"/>
 <link href="/"/>
 <updated>2012-02-21T20:34:58+00:00</updated>
 <id>/openstack</id>
 <author>
   <name>John Dickinson</name>
   <email>john@johnandkaren.com</email>
 </author>

 
 <entry>
   <title>Democratization of Data</title>
   <link href="/openstack/democratization-of-data"/>
   <updated>2011-10-28T00:00:00+00:00</updated>
   <id>/openstack/democratization-of-data</id>
   <content type="html">&lt;p&gt;You should have control over your data. If you want to host your own data, you should be able to. If you want to pay someone else to host your data, you should be able to interact with their systems in the same way you interact with your local system. You should be able to change hosting providers without changing your interface. You should be able to pull your data from a hosting provider to host it locally. You should be able to host your data across many providers seamlessly. You should be able to move your compute needs to your data storage. You should be able to separate your concerns over distributing your data from who is distributing your data.&lt;/p&gt;

&lt;p&gt;You should be in control of where you data lives, how you compute on it, and how you distribute it. This is the promise of &lt;a href=&quot;http://openstack.org&quot;&gt;Openstack&lt;/a&gt;: a common infrastructure that puts you in control of your data. This promise is the democratization of data.&lt;/p&gt;

&lt;p&gt;Ultimately, the democratization of data comes down to storage. Openstack provides a high-quality object storage system called &lt;a href=&quot;http://swift.openstack.org&quot;&gt;swift&lt;/a&gt;. Swift is ideal for unstructured data that can grow without bound. Backups and static web content are perfect examples of good use cases for swift.&lt;/p&gt;

&lt;p&gt;Computing on your data is provided by another Openstack project: &lt;a href=&quot;http://nova.openstack.org&quot;&gt;nova&lt;/a&gt;. Nova enables the management of large numbers of dynamic virtual machines. It is directly comparable to AWS EC2 and Rackspace Cloud Servers.&lt;/p&gt;

&lt;p&gt;Openstack currently lacks integration with content delivery networks. Such integration should facilitate simple distribution and management of data with exiting CDN providers (or perhaps even allow you to run your own CDN).&lt;/p&gt;

&lt;p&gt;These three pillars, along with various complementary projects like identity management, queueing, and block storage, provide a foundation upon which we can build the future.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Swift State of the Project Fall 2011</title>
   <link href="/openstack/swift-state-of-the-project"/>
   <updated>2011-10-13T00:00:00+00:00</updated>
   <id>/openstack/swift-state-of-the-project</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://swift.openstack.org&quot;&gt;Swift&lt;/a&gt; has been running in production at Rackspace for over a year with near 100% uptime. Rackspace&amp;rsquo;s swift clusters store billions of objects and petabytes of data. Several companies, including Internap, KT, SDSC, HP, and others, have also deployed and are running swift clusters in production. These clusters range in size from fairly small to several petabytes. Other organizations, including CERN, are evaluating swift for production use. Overall, swift is a success.&lt;/p&gt;

&lt;p&gt;Swift&amp;rsquo;s active developers are all currently Rackspace employees. Other companies have talked about features and promised to contribute code, but so far, no major patches have been forthcoming. Unfortunately this means that the day-in day-out needs for swift come from Rackspace. While many needs can be anticipated and met simply by looking at Rackspace&amp;rsquo;s needs, some key areas of development are missed. For example, Rackspace does not often deploy new swift clusters, therefore automatic deployment tools are neglected. Similarly, Rackspace&amp;rsquo;s needs focus on a general use case for a broad set of customers. Other clusters may have more specific needs based on different use cases. Until swift developers see these other use cases, swift is not likely to optimize for them.&lt;/p&gt;

&lt;p&gt;Although other companies are not currently contributing swift code, many companies are active in the community. Piston, Nebula, Voxel, HP, and others are actively engaging the developer and user communities. They sponsor biweekly meetups, engage on the mailing lists, contribute in IRC, participate in the design summits, and generally talk about what they are doing and what needs to be done. For this, I am grateful. It seems that swift currently meets the needs of these groups. I hope that as they grow and use swift more, they will see areas to improve the software and contribute those improvements back to the community.&lt;/p&gt;

&lt;p&gt;As we move forward with swift development, certain fundamental things must be preserved, protected, and encouraged. We must maintain a healthy project. We must ensure good feedback channels with users. We must encourage other companies to continue to participate and even submit patches. We must do what we can to encourage and support an active ecosystem of tools for swift. The universe of end-user tools, automation software, and monitoring systems all factor in to a decision to use swift or not. If we fail in these fundamental areas, we might as well pack up and go do something else.&lt;/p&gt;

&lt;p&gt;With these concerns in mind, I see three realms of future swift development. Realm one is improving swift by fixing bugs and adding features. Realm two of swift development is data-compute locality. Most (if not all) data processing tasks can be improved by reducing the latency between where the data is stored and where it is processed. Realm three moves beyond data-compute locality by a single swift deployer and solves data federation.&lt;/p&gt;

&lt;p&gt;We are currently working on realm one: improve swift by fixing bugs and adding features. The main goals are around very large and very small clusters. This is generally an ongoing task, and even when large and small deployments are better served, there will always be bug fixes and smaller feature improvements. Some features will be large, and some will be small. The work here mostly focuses on filling out the feature gaps in swift for specific use cases.&lt;/p&gt;

&lt;p&gt;Realm two is waiting on nova stability. After large nova clusters are running in production, we can start to explore what it will take to unify the clusters. The goal is to bring compute to &amp;ldquo;near&amp;rdquo; the data in a network sense. The closest &amp;ldquo;near&amp;rdquo; can be is local to the same server, but it could perhaps be more simply solved by only being in the same cabinet or availability zone. Since data is &amp;ldquo;sticky&amp;rdquo; and hard to move, oftentimes bringing the compute to the data is more realistic. I do not foresee swift ever merging with nova; rather, I would like to see swift and nova cooperate in such a way that swift&amp;rsquo;s ring can be used as a scheduler for nova VMs. Currently nova is in a state of flux and needs to focus on maturity before large problems like swift integration are tackled. I expect nova-swift integration to be on hold for about another 12 months while nova matures.&lt;/p&gt;

&lt;p&gt;Realm three is the ultimate goal. Federating compute is a fairly simple concept to understand. &amp;ldquo;Bursting into the cloud&amp;rdquo; is common enough to have become a marketing phrase. Federating storage still needs to be defined even before it can be understood. I believe it involves datasets distributed and replicated across many storage providers and dynamically balancing access to them. This is something I&amp;rsquo;ve &lt;a href=&quot;http://programmerthoughts.com/openstack/future-vision-of-storage/&quot;&gt;talked about&lt;/a&gt; in a previous blog post.&lt;/p&gt;

&lt;p&gt;Solving these problems will take a lot of work and a lot of time. As we move from one realm to the next, we must not consider work to be &amp;ldquo;done&amp;rdquo; in the previous realms. We must always listen to feedback and continue to polish the system as a whole.&lt;/p&gt;

&lt;p&gt;Swift has been actively developed for a little over two years now. It was revealed to the world about one year ago and has made tremendous progress since. I&amp;rsquo;m quite proud to have been a part of the project. We have all learned a lot and had a lot of fun. Swift is in a great place: openstack momentum is growing, more users are deploying swift, and the vast majority of the feedback we hear is positive. Swift&amp;rsquo;s first two years have been a success. As we remember the fundamental things and work together as part of an active community, swift&amp;rsquo;s future will be even brighter than its past.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Setting Permissions (ACLs) on Openstack Swift containers</title>
   <link href="/openstack/swift-permissions"/>
   <updated>2011-08-04T00:00:00+00:00</updated>
   <id>/openstack/swift-permissions</id>
   <content type="html">&lt;p&gt;I frequently see people in the #openstack IRC channel on freenode asking about how to set up ACLs in swift. Here&amp;rsquo;s a short tutorial.&lt;/p&gt;

&lt;p&gt;First, set up two accounts. How to do this is specific to your auth system. Fir this example, I&amp;rsquo;ll use the default &lt;code&gt;tempauth&lt;/code&gt; that ships with swift.&lt;/p&gt;

&lt;p&gt;In your proxy server config file, under the tempauth section, add the accounts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;user_test_tester = testing .admin
user_test_tester2 = testing2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first user (&amp;ldquo;tester&amp;rdquo;) has admin privilages on the account (&amp;ldquo;.admin&amp;rdquo;). The second user (&amp;ldquo;tester2&amp;rdquo;) is in the test account, but will only have access to what the first user grants him. The two accounts don&amp;rsquo;t need to have the same tempauth account (the &amp;ldquo;test&amp;rdquo; part).&lt;/p&gt;

&lt;p&gt;Auth the first user and create a container. Then an read permissions on that container for the second user:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl -i -H &quot;X-Auth-User: test:tester&quot; -H &quot;X-Auth-Key: testing&quot; \
    http://swift/auth/v1.0
$ curl -i -XPUT -H &quot;X-Auth-Token: token1&quot; http://swift/v1/AUTH_test/container
$ curl -i -XPOST -H &quot;X-Auth-Token: token1&quot; -H &quot;X-Container-Read: test:tester2&quot; \
    http://swift/v1/AUTH_test/container
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that in the last curl command the proper value for the ACL is &lt;code&gt;&amp;lt;account&amp;gt;:&amp;lt;user&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, auth the second account. Note that the second account cannot list the containers or do anything but read what&amp;rsquo;s in the container called &amp;ldquo;container&amp;rdquo;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl -i -H &quot;X-Auth-User: test:tester2&quot; -H &quot;X-Auth-Key: testing2&quot; \
    http://swift/auth/v1.0
$ curl -i -H &quot;X-Auth-Token: token2&quot; http://swift/v1/AUTH_test/
$ curl -i -H &quot;X-Auth-Token: token2&quot; http://swift/v1/AUTH_test/container/
$ curl -i -XPUT --data-binary 1234 -H &quot;X-Auth-Token: token2&quot; \
    http://swift/v1/AUTH_test/container/foo
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If one desires, adding the X-Container-Write header to a container will similarly grant write access.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>A Vision of the Future of Storage</title>
   <link href="/openstack/future-vision-of-storage"/>
   <updated>2011-05-22T00:00:00+00:00</updated>
   <id>/openstack/future-vision-of-storage</id>
   <content type="html">&lt;p&gt;Businesses are beginning to embrace &amp;ldquo;The Cloud&amp;rdquo; as a cost-effective way to offload internal compute and storage needs. However, many businesses still keep data in-house. Unfortunately, most of this data is useful when one can compute on it, and transfer costs to move the data to the compute centers can be cost-prohibitive. This implies that by storing data in-house, businesses also must host compute resources in-house.&lt;/p&gt;

&lt;p&gt;Therefore, in an effort to save money, businesses have searched for ways to &amp;ldquo;burst into the cloud&amp;rdquo;&amp;ndash;a marketing phrase which simply means temporarily using someone else&amp;rsquo;s public compute infrastructure to do some work and only paying for what is used. Unfortunately, this presents difficulty when it comes to storage. The data lives in-house. &amp;ldquo;Bursting&amp;rdquo; doesn&amp;rsquo;t make sense for storage.&lt;/p&gt;

&lt;p&gt;There is an answer: federated storage.&lt;/p&gt;

&lt;p&gt;Federated storage is where the storage cluster becomes a network of storage clusters. Upload data to a storage endpoint. Metadata on your data provides rules as to where the data should be stored. For example, rules can be set to require that the data is stored in geographically distinct regions. Other rules can be set to require that data is stored with at least two companies. This gives you both geographic and business redundancy. If an earthquake knocks out a data center or a company goes out of business, the data is protected.&lt;/p&gt;

&lt;p&gt;Storage providers can enter into peering relationships that allow them to offer a storage network to their customers. For example, imagine that Dell, Rackspace, Internap, and ATT all have storage clusters, each with their own customers. Each company has different capacity and locations available. If these companies entered in to a federated storage agreement, they could share customers, capacity, and geographic redundancy.&lt;/p&gt;

&lt;p&gt;This federated storage network is similar to how airlines organize today. American Airlines and British Airlines service different parts of the world, but they are both part of the One World network that allows passengers to collect rewards with one company even when using the partner airlines.&lt;/p&gt;

&lt;p&gt;In the same way, smaller companies like Internap and Rackspace can partner with larger companies like ATT and Dell. A Rackspace customer can choose to store data both in one of Rackspace&amp;rsquo;s data centers and also in geographic regions that Rackspace does not serve. Perhaps Dell builds a data center in South Asia. Rackspace customers would then be able to store data redundantly in both Rackspace&amp;rsquo;s data center in Chicago and also in Dell&amp;rsquo;s data center on the other side of the world.&lt;/p&gt;

&lt;p&gt;Other storage federations could also arise. Users then are able to choose where to store their data based on the entire offering of the partnership of providers rather than on one company&amp;rsquo;s merits. Companies benefit by having access to infrastructure without the need for expensive cap-ex costs. Customers benefit by having more storage options.&lt;/p&gt;

&lt;p&gt;As another benefit to customers and businesses alike, a storage federation also implies that data is portable. Although data is sticky, customers would be able to move their data from one company or federation to another using the same API that the companies within the federation use to manage the federation itself. At a provider level, storage could be temporarily moved to another system for things like planned downtime or unplanned outages.&lt;/p&gt;

&lt;p&gt;However, if data is portable between companies, this implies that the companies must compete on something other than simply having the infrastructure. A smaller company could join a federation and suddenly allow their customers access to a previously unavailable network of storage. Companies would need to differentiate on service, user interface, price, or myriad other things other than the availability of the storage itself. Of course, these federations would be governed by contracts established by the cooperating companies, and, so, the specifics of any one federation would be unique.&lt;/p&gt;

&lt;p&gt;Federated storage networks become very interesting when the associated compute resources are accounted for. Rules for storage could take into account the available compute resources in particular locations or with particular providers. Using the companies from the example above, ATT may provide a large number of available compute resources, but Dell may provide particular types of compute&amp;ndash;GPU vector processing, for example. A Rackspace customer, then, can choose to make their backups of their Rackspace-hosted web application stored in Rackspace&amp;rsquo;s storage infrastructure but the generated data could be stored with a rule requiring a copy be made available in ATT&amp;rsquo;s storage network near ATT&amp;rsquo;s large availability of compute resources. Or, perhaps a Dell customer could require that data be made available only within Internap&amp;rsquo;s storage infrastructure because of some particular government regulation Internap meets.&lt;/p&gt;

&lt;p&gt;Federated storage allows data to break free of a single company. Customers gain more control over their data by being able to choose where their data is stored. Businesses gain the ability to store and process their data in the most cost-effective way possible. Users are no longer limited to their own infrastructure. Federated storage allows truly unlimited data.&lt;/p&gt;

&lt;p&gt;I&amp;rsquo;m a developer on &lt;a href=&quot;http://openstack.org&quot;&gt;Openstack&amp;rsquo;s&lt;/a&gt; &lt;a href=&quot;http://swift.openstack.org&quot;&gt;object storage system&lt;/a&gt;, and I believe that Openstack is uniquely positioned to achieve this vision.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>The Story of an Openstack Feature</title>
   <link href="/openstack/the-story-of-an-openstack-feature"/>
   <updated>2010-12-10T00:00:00+00:00</updated>
   <id>/openstack/the-story-of-an-openstack-feature</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://openstack.org&quot;&gt;Openstack&lt;/a&gt; is a fairly large open-source project with a set of core developers. Anyone can submit patches for bugfixes or new features, but sometimes the process can be a little mysterious, especially for larger features or for developers that haven&amp;rsquo;t contributed to open-source projects before.&lt;/p&gt;

&lt;p&gt;For the &lt;a href=&quot;http://openstack.org/projects/storage/&quot; title=&quot;swift&quot;&gt;swift project (Openstack storage)&lt;/a&gt;, we have a mature codebase running in a production environment. Any patches that are accepted must not have adverse effects for the scalability or performance of the system a whole.&lt;/p&gt;

&lt;p&gt;One of the features currently being developed for swift is large object support. The feature has gone through many iterations in both design and code, but perhaps the most important development came at the Bexar design summit. As the developers on the project, we knew that files larger than 5GB were important, but we did not have a good use case. We did not want to develop a solution for large files that did not meet the needs of user who would actually want the feature. At the design summit, we were able to talk with Openstack users at NASA who had specific uses in mind for large objects. A &lt;a href=&quot;https://blueprints.launchpad.net/swift/+spec/bexar-client-side-chunking&quot; title=&quot;launchpad blueprint&quot;&gt;launchpad blueprint&lt;/a&gt; was developed, and the existing coding work was refocused to meet the needs of the users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The large object feature has now been &lt;a href=&quot;https://code.launchpad.net/~gholt/swift/lobjects4/+merge/43596&quot;&gt;approved&lt;/a&gt; and lives in the swift trunk.&lt;/p&gt;

&lt;p&gt;There are several ways to implement large object support. First, and most simply, is to raise the object size limit constant. The constant determines how big a file can be, but relying on it has limits, and raising it has some nasty side effects. Since an object is stored on one physical drive (per replica), one can only raise the object limit constant to the size of the smallest drive in the cluster. If a cluster is filled with 2TB drives, this means that the largest object can only be 2TB&lt;sup id=&quot;fnref:note&quot;&gt;&lt;a href=&quot;#fn:note&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Additionally, since objects are spread evenly throughout the cluster, the balance of the fullness of each drive in the cluster is related to the ratio of the max object size to the size of the drives in the cluster. At scale, each drive in a swift cluster will differ from the average fullness of all drives in the cluster by an amount proportional to the max object size.&lt;/p&gt;

&lt;p&gt;If simply raising the max object size constant won&amp;rsquo;t work, another way to support large objects is to split the object into chunks and tell the system to treat groups of objects as one large object. The naive implementation is to split the objects as they are streamed into the system. As an object is loading into swift, swift could split the object after a certain number of bytes and then write the next bytes to a different location in the cluster. This implementation has the advantage of not requiring the user to know anything about object chunks or having a final &amp;ldquo;commit&amp;rdquo; step to finalize the large object write. This implementation was actually written, but it was rejected for its enormous complexity and various failure condition edge cases. One of its biggest disadvantages is that it does not allow for users to upload parts of the large uploads in parallel. Asking a user to upload terabytes of data as one upload simply isn&amp;rsquo;t practical.&lt;/p&gt;

&lt;p&gt;After talking to swift users at the Bexar design summit, especially those from NASA, we realized that a large object solution that is implemented with client-side chunking would be sufficient to meet the need of our users and offer some advantages without the disadvantages of the server-side chunking implementation.&lt;/p&gt;

&lt;p&gt;A client-side chunking implementation of large objects requires users to upload the chunks of the large objects as normal objects, but with a unique prefix. For example, one could upload three 5GB files (obj/1, obj/2, and obj/3). Then the user creates a manifest object that defines the prefix of the objects (&amp;ldquo;obj/&amp;rdquo; in this example). Now, the user can upload or the chunks concurrently, but if the manifest file is downloaded, the system will stream the concatenated chunks to the client. This allows for great flexibility for the user and still allows the system to support very large objects. With the &lt;a href=&quot;https://code.launchpad.net/~gholt/swift/lobjects4/+merge/41020&quot; title=&quot;launchpad merge proposal&quot;&gt;current proposed implementation&lt;/a&gt;, the only limit of large files is the size of the cluster itself. If the operators can deploy servers faster than the user can upload the data, the object size is truly unlimited. This is much better than the similar feature in S3 that was &lt;a href=&quot;http://aws.typepad.com/aws/2010/12/amazon-s3-object-size-limit.html&quot; title=&quot;S3 large objects&quot;&gt;announced today&lt;/a&gt;. This feature in swift will allow a manifest file to be created for existing content. Additionally, a manifest can be created for a large object, and content can be added to that large object at a later time without updating the manifest. Possible applications beyond simply storing large single objects include streaming all data in a single container to a client as one large object, appending to objects, maintaining sym links to files, and an upload pseudo pause and resume.&lt;/p&gt;

&lt;p&gt;This is a feature that will be included in the swift codebase very soon, and is something we are very excited about. We think it balances the needs of the system (scalability and performance) with the requirements of the users. This feature would not have been implemented nearly as well without input from the community. The conversations we had with people at the design summit were invaluable to the design of this feature.&lt;/p&gt;

&lt;p&gt;As always, patches are welcome in swift. If you have bug fixes or an idea for new features, we welcome contributions. Talk to us; submit your code; give us your use cases&amp;ndash;we want swift to be the best it can be. The swift code is hosted on &lt;a href=&quot;http://launchpad.net/swift&quot;&gt;Launchpad&lt;/a&gt;, and the developers can be found on IRC in #openstack on irc.freenode.net.&lt;/p&gt;

&lt;p&gt;Community input in the Openstack project is vital. I&amp;rsquo;m excited about where the project has been, but even more excited to see where the community takes it in the future.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:note&quot;&gt;
      &lt;p&gt;Actually, it would be less than 2TB. A swift cluster is full when the first hard drive in the cluster is full. Therefore, it is wise to limit the fullness of the drives to about 80% of their capacity.&lt;a href=&quot;#fnref:note&quot; rel=&quot;reference&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</content>
 </entry>
 
 <entry>
   <title>Features I Would Like to See in Swift</title>
   <link href="/openstack/features-i-would-like-to-see-in-swift"/>
   <updated>2010-11-09T00:00:00+00:00</updated>
   <id>/openstack/features-i-would-like-to-see-in-swift</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://swift.openstack.org&quot;&gt;Swift&lt;/a&gt; is a great way to store large amounts of data cheaply. This week I&amp;rsquo;m at the &lt;a href=&quot;http://summit.openstack.org&quot;&gt;OpenStack design summit&lt;/a&gt;, and I&amp;rsquo;ve been thinking of features I would like to see added to swift.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;WebDAV support:&lt;/strong&gt; WebDAV support would allow swift users to mount public containers as network drives in any modern operating system. It could probably be implemented as WSGI middleware, and would therefore be an optional feature for any swift deployment.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Auto-compression:&lt;/strong&gt; I would like to see swift support dynamic compression and decompression of responses based on the Accept header in the request. This feature too could be implemented as WSGI middleware and be optional for any swift deployment. One concern, however, is the extra CPU cycles required for the compression and decompression. The swift proxy servers can see high CPU load under heavy traffic conditions.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Object versioning:&lt;/strong&gt; A more complicated feature, object versioning would allow old objects to be accessed after newer data has overwritten them. New semantics for accessing old versions (or even enabling/disabling this feature) would have to be created, and many questions relating to failure scenarios would need to be answered.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;stop_marker query parameter in listings:&lt;/strong&gt; Currently, account and container GETs can be filtered with a marker query parameter. The marker parameter will cause the listing to return values that are greater than the marker. In a container of 100 items a marker equal to item 50 will return items 51 through 100. However, if one wants to only fetch items less than item 60, the listing has to be filtered by the client. A stop_marker query parameter would return anything less than or equal to the parameter value and would be able to be used in conjunction with the other query parameters. This would be a relatively simple feature to add without any obvious (to me) risks to the system as a whole. &lt;strong&gt;Update&lt;/strong&gt;: This feature is now supported in swift: &lt;a href=&quot;https://code.launchpad.net/~notmyname/swift/end_marker&quot;&gt;https://code.launchpad.net/~notmyname/swift/end_marker&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If anyone would like to add any of these features to swift, please &lt;a href=&quot;http://launchpad.net/swift&quot;&gt;grab the code&lt;/a&gt; and submit your patch.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Swift (OpenStack Object Storage) Overview</title>
   <link href="/openstack/swift-openstack-object-storage-overview"/>
   <updated>2010-11-06T00:00:00+00:00</updated>
   <id>/openstack/swift-openstack-object-storage-overview</id>
   <content type="html">&lt;p&gt;&lt;strong&gt;What is it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Swift is a highly scalable redundant unstructured data store designed to store large amounts of data cheaply. &amp;ldquo;Highly scalable&amp;rdquo;, means that it can scale to thousands of machines with tens of thousands of hard drives. Swift is designed to be horizontally scalable&amp;ndash;there is no single point of failure. In most large-scale deployments, swift should become more performant as the cluster grows larger. In the &lt;a href=&quot;http://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;CAP theorem&lt;/a&gt;, swift sacrifices C for A and P. Most operations happen synchronously, but consistency is sacrificed in failure scenarios.&lt;/p&gt;

&lt;p&gt;&amp;ldquo;Redundant&amp;rdquo; means that swift stores multiple copies of each entity in the system. Each copy is stored in physically distinct availability zones, so common failures like hard drive failure network issues are highly unlikely to cause data loss or downtime.&lt;/p&gt;

&lt;p&gt;&amp;ldquo;Unstructured data store&amp;rdquo; means that swift simply stores bits. Swift is not a database. Swift is not a block-level storage system. Swift stores blobs of data. Swift offers namespace groupings within accounts as containers, but no other relation between objects is stored.&lt;/p&gt;

&lt;p&gt;For more information on the internal workings of swift, see &lt;a href=&quot;http://swift.openstack.org/overview_architecture.html&quot;&gt;http://swift.openstack.org/overview_architecture.html&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What can it do?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Although swift is a key-value store, it is optimized for highly available reads and writes. This makes it ideal for storing backups and static web content. Swift is well-suited to storing and serving server backups, VM snapshots, database backups, image libraries, scripts and stylesheets, or or any other static content that needs to be accessed frequently.&lt;/p&gt;

&lt;p&gt;Also, because swift guarantees that objects will be available for reading as soon as they are successfully written, swift can be used to store content that changes frequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does one use swift?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Swift has a ReST-ful API. All communication with swift is done over HTTP, using the HTTP verbs to signal the requested action. A swift storage URL looks like&lt;/p&gt;

&lt;p&gt;&lt;code&gt;swift.example.com/v1/account/container/object&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Swift&amp;rsquo;s URLs have four basic parts. Using the example above, these parts are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Base: &lt;code&gt;swift.example.com/v1/&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Account: &lt;code&gt;account&lt;/code&gt;. An account is determined by the auth server when the account is created. The devauth server that ships with swift creates URLs of the form AUTH_uuid.&lt;/li&gt;
  &lt;li&gt;Container: &lt;code&gt;container&lt;/code&gt;. Containers are namespaces used to group objects within an account&lt;/li&gt;
  &lt;li&gt;Object: &lt;code&gt;object&lt;/code&gt;. Objects are where the actual data is stored in swift. Object names may contain /, so pseudo-nested directories are possible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One may get a list of all containers in an account with a &lt;code&gt;GET&lt;/code&gt; on the account:
&lt;code&gt;GET http://swift.example.com/v1/account/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;One may create new containers with a &lt;code&gt;PUT&lt;/code&gt; to the container:
&lt;code&gt;PUT http://swift.example.com/v1/account/new_container&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;One may list all object in a container with a &lt;code&gt;GET&lt;/code&gt; on the container:
&lt;code&gt;GET http://swift.example.com/v1/account/container/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;One may create new objects with a &lt;code&gt;PUT&lt;/code&gt; on the object:
&lt;code&gt;PUT http://swift.example.com/v1/account/container/new_object&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Additionally, one may use &lt;code&gt;POST&lt;/code&gt; to change metadata on containers and objects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Swift is completely open-source released under the Apache 2.0 license. Find it at &lt;a href=&quot;https://launchpad.net/swift&quot;&gt;http://swift.openstack.org&lt;/a&gt;. Current documentation is found at &lt;a href=&quot;http://swift.openstack.org&quot;&gt;http://swift.openstack.org&lt;/a&gt;. Patches are welcome.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Server-side Object Copy in OpenStack storage</title>
   <link href="/openstack/server-side-object-copy-in-openstack-storage"/>
   <updated>2010-07-24T00:00:00+00:00</updated>
   <id>/openstack/server-side-object-copy-in-openstack-storage</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack storage&lt;/a&gt; (codenamed &lt;a href=&quot;http://launchpad.net/swift&quot;&gt;swift&lt;/a&gt; supports server-side object copy.&lt;/p&gt;

&lt;p&gt;Suppose you upload a file with the wrong object name or you needed to move some objects to another container. Without a server-side copy feature, you would need to reupload the same content and delete the existing object. With server-side object copy, you can save the step of re-uploading the content and thus also save the associated bandwidth charges, if any were to apply.&lt;/p&gt;

&lt;p&gt;There are two ways to copy an existing object to another object in swift. One, do a PUT to the new object (the target) location, but add the &amp;ldquo;X-Copy-From&amp;rdquo; header to designate the source of the data. The header value should be the container and object name of the source object in the form of &amp;ldquo;/container/object&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;The second way to do an object copy is similar. This time, do a COPY to the existing object, and include the &amp;ldquo;Destination&amp;rdquo; header to specify the target of the copy. The header value is the container and new object name in the form or &amp;ldquo;/container/object&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;With both of these methods, the destination container must exist before attempting the copy.&lt;/p&gt;

&lt;p&gt;If you were wanting to perform a move of the objects rather than a copy, you would need to send a DELETE request to the old object. A move simply becomes a COPY + DELETE.&lt;/p&gt;

&lt;p&gt;All metadata is preserved during the object copy. Note that you can set metadata on the request to copy the object (either the PUT or the COPY) and the metadata will overwrite any conflicting keys on the target (new) object. One interesting use case is to copy an object to itself and set the content type to a new value. This is the only way to change the content type of an existing object.&lt;/p&gt;
</content>
 </entry>
 

</feed>
