Revisiting an old blog post on how to quickly upload data to cloud files, I figured an improved example would be beneficial.
Below is an updated script that calls the API directly and uses eventlet for concurrency. The code is available on github.
1 #!/usr/bin/env python
2
3 import eventlet
4 eventlet.monkey_patch()
5
6 import sys
7 import httplib
8 import os
9
10 from cf_auth import username, apikey
11
12 container_name = sys.argv[1]
13
14 # auth
15 conn = httplib.HTTPSConnection('auth.api.rackspacecloud.com')
16 conn.request('GET', '/auth',
17 headers={'x-auth-user': username, 'x-auth-key': apikey})
18 resp = conn.getresponse()
19 AUTH_TOKEN = resp.getheader('x-auth-token')
20 URL = resp.getheader('x-storage-url')
21 CONNECTION_ENDPOINT = URL.split('/')[2]
22 conn.close()
23
24 SEND_HEADERS = {'X-Auth-Token': AUTH_TOKEN, 'Content-Type': 'text/plain'}
25 CONTAINER_PATH = '/' + '/'.join(URL.split('/')[3:]) + '/' + container_name
26
27 # create the container
28 conn = httplib.HTTPSConnection(CONNECTION_ENDPOINT)
29 conn.request('PUT', CONTAINER_PATH, headers=SEND_HEADERS)
30 conn.getresponse().read()
31 conn.close()
32
33
34 def run(filename_list):
35 conn = httplib.HTTPSConnection(CONNECTION_ENDPOINT)
36 for filename in filename_list:
37 with open(filename, 'rb') as f:
38 conn.request('PUT', CONTAINER_PATH + '/' + filename, body=f,
39 headers=SEND_HEADERS)
40 resp = conn.getresponse()
41 resp.read()
42 conn.close()
43
44 data_list = ['test_data/%s' % x for x in os.listdir('test_data')
45 if x.endswith('.dat')]
46 concurrency = 20
47 pool = eventlet.GreenPool(size=concurrency)
48 [pool.spawn(run, data_list[concurrency*i:concurrency*(i+1)])
49 for i in xrange(len(data_list)/concurrency)]
50 pool.waitall()
This code is dramatically faster than either of my two previous examples. While writing this post, the python-cloudfiles example completed in 9 minutes 11 seconds. The direct API example completed in 7 minutes 51 seconds. The code above using eventlet completed in only 1 minute 13 seconds. This is nearly ten times faster than using the language bindings and without much tweaking at all.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The thoughts expressed here are my own and do not necessarily represent those of my employer.