Programmer Thoughts

By John Dickinson

Quickly Uploading Data To Cloud Files, Part 2

June 27, 2011

Revisiting an old blog post on how to quickly upload data to cloud files, I figured an improved example would be beneficial.

Below is an updated script that calls the API directly and uses eventlet for concurrency. The code is available on github.

 1 #!/usr/bin/env python
 2 
 3 import eventlet
 4 eventlet.monkey_patch()
 5 
 6 import sys
 7 import httplib
 8 import os
 9 
10 from cf_auth import username, apikey
11 
12 container_name = sys.argv[1]
13 
14 # auth
15 conn = httplib.HTTPSConnection('auth.api.rackspacecloud.com')
16 conn.request('GET', '/auth',
17              headers={'x-auth-user': username, 'x-auth-key': apikey})
18 resp = conn.getresponse()
19 AUTH_TOKEN = resp.getheader('x-auth-token')
20 URL = resp.getheader('x-storage-url')
21 CONNECTION_ENDPOINT = URL.split('/')[2]
22 conn.close()
23 
24 SEND_HEADERS = {'X-Auth-Token': AUTH_TOKEN, 'Content-Type': 'text/plain'}
25 CONTAINER_PATH = '/' + '/'.join(URL.split('/')[3:]) + '/' + container_name
26 
27 # create the container
28 conn = httplib.HTTPSConnection(CONNECTION_ENDPOINT)
29 conn.request('PUT', CONTAINER_PATH, headers=SEND_HEADERS)
30 conn.getresponse().read()
31 conn.close()
32 
33 
34 def run(filename_list):
35     conn = httplib.HTTPSConnection(CONNECTION_ENDPOINT)
36     for filename in filename_list:
37         with open(filename, 'rb') as f:
38             conn.request('PUT', CONTAINER_PATH + '/' + filename, body=f,
39                      headers=SEND_HEADERS)
40         resp = conn.getresponse()
41         resp.read()
42     conn.close()
43 
44 data_list = ['test_data/%s' % x for x in os.listdir('test_data')
45              if x.endswith('.dat')]
46 concurrency = 20
47 pool = eventlet.GreenPool(size=concurrency)
48 [pool.spawn(run, data_list[concurrency*i:concurrency*(i+1)])
49     for i in xrange(len(data_list)/concurrency)]
50 pool.waitall()

This code is dramatically faster than either of my two previous examples. While writing this post, the python-cloudfiles example completed in 9 minutes 11 seconds. The direct API example completed in 7 minutes 51 seconds. The code above using eventlet completed in only 1 minute 13 seconds. This is nearly ten times faster than using the language bindings and without much tweaking at all.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

The thoughts expressed here are my own and do not necessarily represent those of my employer.