Programmer Thoughts

By John Dickinson

Nested folders in Cloud Files

January 22, 2010

Cloud storages systems like Rackspace’s Cloud Files and Amazon’s S3 are great for storing large amounts of information. A common misconception is that these storage systems behave like traditional file systems, complete with byte-level manipulation and nested folders. It is the second of these that I want to talk about: how to simulate a nested directory (or folder) structure in Rackspace’s Cloud Files.

Cloud Files and S3 are better understood as storage systems, not file systems. Each have three basic parts: accounts, containers (buckets in S3), and objects. In Cloud Files, these three parts can be easily seen in the URL referencing an object. The URL one uses for the ReST API is of the form Containers are large-scale groupings of objects, operating at a higher level, conceptually, than folders. If objects were books, containers may be genres. Containers cannot be nested. That is, one cannot put a container inside of another container.

However, it is fairly easy to simulate a directory structure with objects. These “virtual directories” are not directories, per se, but object name prefixes over which one can iterate. An example should make this concept easy to understand. Suppose I wanted to store books in Cloud Files. From my analogy above, I can use the genre of the book as my container name. The object name will be of the form “author/title”. This way, I can list all books by a particular author (within a genre).

Let’s load the following books into Cloud Files:

First, I will create two containers, horror and comics. Next I will name my files according to the pattern I laid out above. I will have the files “poe/the_pit_and_the_pendulum”, “poe/the_masque_of_the_red_death”, “larson/the_far_side_gallery”, etc. Then I will upload these files to their appropriate container. As a final step, I need to upload “directory marker” files. These are empty (zero-sized) files with a content-type of “application/directory”.

Update: Since the original writing of this post, Cloud Files has added support for a “delimiter” query parameter on directory listings that eliminates the need to maintain directory marker objects. From the Cloud Files developer guide:

You can also use a delimiter parameter to represent a nested directory hierarchy without the need for the directory marker objects. You can use any single character as a delimiter. The listings can return virtual directories - they are virtual in that they don’t actually represent real objects. like the directory markers, though, they will have a content-type of application/directory and be in a subdir section of json and xml results.

The following gets technical. For those wishing to use this feature of Cloud Files and not wanting to program, I recommend using a third-party tool like Cyberduck. This program handles virtual nested directories completely transparently.

Now to take advantage of these “virtual directories”, I can do container listings and give an appropriate path value. In the Python language bindings, this would look similar to the following:

1     container = cf_connection.get_container('horror')
2     books_by_poe = container.get_objects(path='poe')

The path parameter on the get_objects call returns all objects in the given value. In this case, it returns the two books in the virtual “poe” directory. Similarly, if I had given the value “grahame-smith”, I would have found his adaptation of the classic love story.

In my example, I’ve used two genre containers and virtual directories only one level deep. I could just as easily put everything into one container and nested the authors under a genre virtual directory. An object name would then be like “comics/larson/the_far_side_gallery”. The only limitation to using this feature in Cloud Files is keeping the length of the object name (including all virtual directories) under the maximum allowed (1024 characters).

For more detailed information on how to implement virtual directories, see the Cloud Files developer guide. The relevant information is found in the “Pseudo hierarchical folders/directories” section.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The thoughts expressed here are my own and do not necessarily represent those of my employer.