<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>some thoughts &#187; python</title>
	<atom:link href="http://programmerthoughts.com/tags/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://programmerthoughts.com</link>
	<description></description>
	<lastBuildDate>Sun, 25 Jul 2010 16:58:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Compressed File Reader</title>
		<link>http://programmerthoughts.com/programming/compressed-file-reader/</link>
		<comments>http://programmerthoughts.com/programming/compressed-file-reader/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 03:19:13 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[compress]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://programmerthoughts.com/?p=402</guid>
		<description><![CDATA[Recently, I had need of streaming compressed data from an uncompressed file without buffering the entire file in memory. I wrote a class called CompressedFileReader that wraps a file object and provides a read method to generate gzip-compatible compressed data. I show a simple application for this class by uploading data to Cloud Files. The code is available in <a href="http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py">my github account</a>.]]></description>
			<content:encoded><![CDATA[<p>Recently, I had need of streaming compressed data from an uncompressed file without buffering the entire file in memory. Using Python&#8217;s gzip library would require me to create a new file on disk. The zlib module offers streaming, but it does not produce the gzip headers. I wanted something that would produce gzip-compatible output in a streaming fashion.</p>
<p>To solve this, I wrote a class that wraps a file object and provides a read method to generate the compressed data.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> CompressedFileReader<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, file_obj, compresslevel=<span style="color: #ff4500;">9</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._f = file_obj
        <span style="color: #008000;">self</span>._compressor = <span style="color: #dc143c;">zlib</span>.<span style="color: black;">compressobj</span><span style="color: black;">&#40;</span>compresslevel,
                                            <span style="color: #dc143c;">zlib</span>.<span style="color: black;">DEFLATED</span>,
                                            -<span style="color: #dc143c;">zlib</span>.<span style="color: black;">MAX_WBITS</span>,
                                            <span style="color: #dc143c;">zlib</span>.<span style="color: black;">DEF_MEM_LEVEL</span>,
                                            <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">done</span> = <span style="color: #008000;">False</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">first</span> = <span style="color: #008000;">True</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">crc32</span> = <span style="color: #ff4500;">0</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">total_size</span> = <span style="color: #ff4500;">0</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> read<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #66cc66;">*</span>a, <span style="color: #66cc66;">**</span>kw<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">done</span>:
            <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">''</span>
        x = <span style="color: #008000;">self</span>._f.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>a, <span style="color: #66cc66;">**</span>kw<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> x:
            <span style="color: #008000;">self</span>.<span style="color: black;">crc32</span> = <span style="color: #dc143c;">zlib</span>.<span style="color: black;">crc32</span><span style="color: black;">&#40;</span>x, <span style="color: #008000;">self</span>.<span style="color: black;">crc32</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&amp;</span> 0xffffffffL
            <span style="color: #008000;">self</span>.<span style="color: black;">total_size</span> += <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>x<span style="color: black;">&#41;</span>
            compressed = <span style="color: #008000;">self</span>._compressor.<span style="color: black;">compress</span><span style="color: black;">&#40;</span>x<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> compressed:
                compressed = <span style="color: #008000;">self</span>._compressor.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">zlib</span>.<span style="color: black;">Z_SYNC_FLUSH</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>:
            compressed = <span style="color: #008000;">self</span>._compressor.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">zlib</span>.<span style="color: black;">Z_FINISH</span><span style="color: black;">&#41;</span>
            crc32 = <span style="color: #dc143c;">struct</span>.<span style="color: black;">pack</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&lt;L&quot;</span>, <span style="color: #008000;">self</span>.<span style="color: black;">crc32</span> <span style="color: #66cc66;">&amp;</span> 0xffffffffL<span style="color: black;">&#41;</span>
            size = <span style="color: #dc143c;">struct</span>.<span style="color: black;">pack</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&lt;L&quot;</span>, <span style="color: #008000;">self</span>.<span style="color: black;">total_size</span> <span style="color: #66cc66;">&amp;</span> 0xffffffffL<span style="color: black;">&#41;</span>
            footer = crc32 + size
            compressed += footer
            <span style="color: #008000;">self</span>.<span style="color: black;">done</span> = <span style="color: #008000;">True</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">first</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">first</span> = <span style="color: #008000;">False</span>
            header = <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\0</span>37<span style="color: #000099; font-weight: bold;">\2</span>13<span style="color: #000099; font-weight: bold;">\0</span>10<span style="color: #000099; font-weight: bold;">\0</span>00<span style="color: #000099; font-weight: bold;">\0</span>00<span style="color: #000099; font-weight: bold;">\0</span>00<span style="color: #000099; font-weight: bold;">\0</span>00<span style="color: #000099; font-weight: bold;">\0</span>00<span style="color: #000099; font-weight: bold;">\0</span>02<span style="color: #000099; font-weight: bold;">\3</span>77'</span>
            compressed = header + compressed
        <span style="color: #ff7700;font-weight:bold;">return</span> compressed</pre></td></tr></table></div>

<p>This code, with some simple tests and examples, is available on github: <a href="http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py">http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py</a></p>
<p>One potential use case is streaming compressed data to a web service. For example, one could use this class to compress data as it is streamed to cloud files.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="python" style="font-family:monospace;">conn = cloudfiles.<span style="color: black;">get_connection</span><span style="color: black;">&#40;</span>username, apikey<span style="color: black;">&#41;</span>
container = conn.<span style="color: black;">create_container</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'some_container'</span><span style="color: black;">&#41;</span>
test_object = container.<span style="color: black;">create_object</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'file.gz'</span><span style="color: black;">&#41;</span>
test_object.<span style="color: black;">content_type</span> = <span style="color: #483d8b;">'application/x-gzip'</span>
<span style="color: #ff7700;font-weight:bold;">with</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'path/to/large/uncompressed/file'</span>, <span style="color: #483d8b;">'rb'</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> f:
    compressed_f = CompressedFileReader<span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>
    test_object.<span style="color: black;">write</span><span style="color: black;">&#40;</span>compressed_f<span style="color: black;">&#41;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/compressed-file-reader/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cloud Files CDN Stats</title>
		<link>http://programmerthoughts.com/programming/cloud-files-cdn-stats/</link>
		<comments>http://programmerthoughts.com/programming/cloud-files-cdn-stats/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 22:57:14 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Cloud Files]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[cloud files]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://programmerthoughts.com/?p=385</guid>
		<description><![CDATA[I wrote a small Python script that loads Cloud Files CDN log files and aggregates the data. The code is available in <a href="http://github.com/notmyname/python_scripts/tree/master/cf_stats/">my github account</a>.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.rackspacecloud.com/cloud_hosting_products/files">Cloud Files</a> offers public content through Limelight&#8217;s CDN network. On public containers, one can opt in to save the logs for all content requested from the CDN. These logs are for the raw usage in an apache log format and are stored compressed in a container named &#8220;.CDN_ACCESS_LOGS&#8221;. One can then parse these logs with any commercial analytics tool or use a custom solution. Being a developer, I wrote a small Python script that loads these log files and aggregates the data.</p>
<p>The code can be found in <a href="http://github.com/notmyname/python_scripts/tree/master/cf_stats/">my github repository</a>.</p>
<p>After updating the code with your own Cloud Files credentials (or using your own cf_auth module), usage is similar to the following:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="shell" style="font-family:monospace;">$ ./cf_stats.py obj_name</pre></td></tr></table></div>

<p>&#8220;obj_name&#8221; is one of the keys the stats can be grouped on. Others include &#8220;date&#8221;, &#8220;container_name&#8221;, and &#8220;user_agent&#8221;. The default is &#8220;obj_name&#8221; and any incorrect parameter will generate a usage message.</p>
<p>Sample output:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre class="shell" style="font-family:monospace;">Object Name: my_file.pdf
Count: 11
User Agents: &quot;Yandex/1.01.001 (compatible; Win16; I)&quot;
Response: 200 304
Referrers: -
IPs: 1.2.3.4 1.2.3.5 1.2.3.6
Dates: 24/Jan/2010 25/Jan/2010 31/Jan/2010 01/Jan/2010 30/Dec/2009
Container Name: some_container</pre></td></tr></table></div>

<p>Any of the given fields can be used as a group. Even if the code output as-is is not to your liking, the script&#8217;s parsing and grouping functions my be a good starting point for writing your own log parser.</p>
]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/cloud-files-cdn-stats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quickly uploading data to Cloud Files</title>
		<link>http://programmerthoughts.com/programming/quickly-uploading-data-to-cloud-files/</link>
		<comments>http://programmerthoughts.com/programming/quickly-uploading-data-to-cloud-files/#comments</comments>
		<pubDate>Sat, 19 Dec 2009 22:24:24 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Cloud Files]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[CDN]]></category>
		<category><![CDATA[cloud files]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[upload]]></category>

		<guid isPermaLink="false">http://programmerthoughts.com/?p=335</guid>
		<description><![CDATA[A custom file uploader can be more efficient than the generic language bindings provided by Cloud Files. I show how to efficiently upload many files to Cloud Files. The code is available in <a href="http://github.com/notmyname/python_scripts/tree/master/cf_speed/">my github account</a>.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.rackspacecloud.com/cloud_hosting_products/files">Cloud Files</a> is a great way to store information, either to take advantage of the CDN or to offload the infrastructure requirements of storing large amounts of data. However Cloud Files is used, though, one still must upload the data to the service before being able to use it.</p>
<p>Uploading the data is not problematic if it can be done in small chunks or spread out over time (images on a blog, for example). The <a href="http://github.com/rackspace">Cloud Files language APIs</a> offer a good way to upload data in these cases. Unfortunately, the language bindings can be terribly slow for uploading large numbers of files. While they do make some optimizations (like reusing connections when available), the code is written to be very generic. For example, the bindings make HEAD requests to ensure all proper data is set before allowing you to upload an object. Additionally, at least in the Python language bindings, HEAD requests are issued when an instance of an object is created. While this is good in a general sense, these HEAD requests become superfluous when doing a large batch upload. One can achieve much better results by using the Cloud FIles ReST API directly.</p>
<p>As an example, let&#8217;s look at the following code which uses the Python API:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> cloudfiles
&nbsp;
username = <span style="color: #483d8b;">'xxxx'</span>
apikey = <span style="color: #483d8b;">'xxxx'</span>
&nbsp;
conn = cloudfiles.<span style="color: black;">get_connection</span><span style="color: black;">&#40;</span>username, apikey<span style="color: black;">&#41;</span>
&nbsp;
container = conn.<span style="color: black;">create_container</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'api_speed_test3'</span><span style="color: black;">&#41;</span>
data_list = <span style="color: black;">&#40;</span><span style="color: #483d8b;">'test_data/%s'</span><span style="color: #66cc66;">%</span>x <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">listdir</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test_data'</span><span style="color: black;">&#41;</span> \
             <span style="color: #ff7700;font-weight:bold;">if</span> x.<span style="color: black;">endswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.dat'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> filename <span style="color: #ff7700;font-weight:bold;">in</span> data_list:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        obj = container.<span style="color: black;">create_object</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
        obj.<span style="color: black;">load_from_filename</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span> cloudfiles.<span style="color: black;">errors</span>.<span style="color: black;">ResponseError</span>, err:
        <span style="color: #ff7700;font-weight:bold;">print</span> err
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>container.<span style="color: black;">list_objects</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>In my tests, using the above code takes about 5.5 minutes to upload 1000 16KB files to Cloud Files.</p>
<p>I wrote the same functionality using the ReST API directly:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">httplib</span>
&nbsp;
username = <span style="color: #483d8b;">'xxxx'</span>
apikey = <span style="color: #483d8b;">'xxxx'</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># auth</span>
conn = <span style="color: #dc143c;">httplib</span>.<span style="color: black;">HTTPSConnection</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'api.mosso.com'</span><span style="color: black;">&#41;</span>
headers = <span style="color: black;">&#123;</span><span style="color: #483d8b;">'x-auth-user'</span>: username, <span style="color: #483d8b;">'x-auth-key'</span>: apikey<span style="color: black;">&#125;</span>
conn.<span style="color: black;">request</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'GET'</span>, <span style="color: #483d8b;">'/auth'</span>, headers=headers<span style="color: black;">&#41;</span>
resp = conn.<span style="color: black;">getresponse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
auth_token = resp.<span style="color: black;">getheader</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'x-auth-token'</span><span style="color: black;">&#41;</span>
url = resp.<span style="color: black;">getheader</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'x-storage-url'</span><span style="color: black;">&#41;</span>
conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #808080; font-style: italic;"># send data</span>
send_headers = <span style="color: black;">&#123;</span><span style="color: #483d8b;">'X-Auth-Token'</span>: auth_token, <span style="color: #483d8b;">'Content-Type'</span>: <span style="color: #483d8b;">'text/plain'</span><span style="color: black;">&#125;</span>
container_path = <span style="color: #483d8b;">'/'</span>+<span style="color: #483d8b;">'/'</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>url.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">'/api_speed_test2'</span>
conn = <span style="color: #dc143c;">httplib</span>.<span style="color: black;">HTTPSConnection</span><span style="color: black;">&#40;</span>url.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
conn.<span style="color: black;">request</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'PUT'</span>, container_path, headers=send_headers<span style="color: black;">&#41;</span>
conn.<span style="color: black;">getresponse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
data_list = <span style="color: black;">&#40;</span><span style="color: #483d8b;">'test_data/%s'</span><span style="color: #66cc66;">%</span>x <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">listdir</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test_data'</span><span style="color: black;">&#41;</span> \
             <span style="color: #ff7700;font-weight:bold;">if</span> x.<span style="color: black;">endswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.dat'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> filename <span style="color: #ff7700;font-weight:bold;">in</span> data_list:
    f = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    conn.<span style="color: black;">request</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'PUT'</span>, container_path+<span style="color: #483d8b;">'/'</span>+filename, body=f,
                 headers=send_headers<span style="color: black;">&#41;</span>
    f.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    resp = conn.<span style="color: black;">getresponse</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    resp.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> resp.<span style="color: black;">status</span> <span style="color: #66cc66;">&gt;</span>= <span style="color: #ff4500;">300</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> resp.<span style="color: black;">status</span>, resp.<span style="color: black;">reason</span>, container_path+<span style="color: #483d8b;">'/'</span>+filename
conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Although slightly longer, the majority of the extra code is for the auth. In my tests, uploading 1000 16KB files took about 4.5 minutes. A whole minute improvement for only 1000 objects is a very good result. I would expect the difference to be even greater as the number of files increases.</p>
<p>All of the code above (plus code to generate the test data) can be found in <a href="http://github.com/notmyname/python_scripts/tree/master/cf_speed/">my github account</a>.</p>
<p>By using the ReST API directly, I can make certain assumptions about my data that are not possible in the generic language bindings. I do not need to do the HEAD requests because I know I have just created the container and I have not uploaded the files yet. I am explicitly setting all the data for each object upload. Further improvements would be to add some error handling and parallelization.</p>
]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/quickly-uploading-data-to-cloud-files/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cloud Files Object Copy</title>
		<link>http://programmerthoughts.com/programming/cloud-files-object-copy/</link>
		<comments>http://programmerthoughts.com/programming/cloud-files-object-copy/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 15:48:13 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Cloud Files]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[cloud files]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://programmerthoughts.com/?p=302</guid>
		<description><![CDATA[Cloud Files does not currently support object copying. However, a simple workaround is to re-upload the file with the new name.  I have added a copy feature to my fork of the python-cloudfiles API that handles all the details of preserving metadata and ensuring that the entire file is not buffered in memory. The code is available in <a href="http://github.com/notmyname/python-cloudfiles/tree/object_copy">my github account</a>.]]></description>
			<content:encoded><![CDATA[<p><strong>Update</strong>: This post is outdated and the referenced github branches no longer exist. The functionality described herein is now supported server-side in the latest version of Cloud Files (<a href="http://launchpad.net/swift">http://launchpad.net/swift</a>). See my <a href="http://programmerthoughts.com/programming/server-side-object-copy-in-openstack-storage/">newer post</a> for more details.</p>
<p>Cloud Files does not currently support object copying. However, a simple workaround is to re-upload the file with the new name. Implementing this workaround may be inconvenient, and one may miss some things like ensuring that metadata is updated. I have added a copy feature to my fork of the python-cloudfiles API that takes care of these details. This is a convenience function only and is not officially supported by Rackspace. Keep in mind that billable bandwidth will be used (unless the servicenet flag is set in the API). One option for renaming large files is to spin up a small <a href="http://www.rackspacecloud.com/cloud_hosting_products/servers">Cloud server</a>, use the API to copy over servicenet, and spin down the server. At $0.015 per hour, one could run a 256MB instance for 100 hours before equalling the transfer cost for copying one 5GB (Cloud Files max size) file over the billed network.</p>
<p>My python-cloudfiles fork on github: <a href="http://github.com/notmyname/python-cloudfiles/tree/object_copy">python-cloudfiles</a></p>
<p>Example script that copies the last file in a container to another container:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> cloudfiles
conn = cloudfiles.<span style="color: black;">get_connection</span><span style="color: black;">&#40;</span>username=<span style="color: #483d8b;">'myname'</span>, api_key=<span style="color: #483d8b;">'mykey'</span><span style="color: black;">&#41;</span>
container_name = <span style="color: #483d8b;">'example_container'</span>
another_container = <span style="color: #483d8b;">'example_container2'</span>
c = conn.<span style="color: black;">get_container</span><span style="color: black;">&#40;</span>container_name<span style="color: black;">&#41;</span>
l = c.<span style="color: black;">list_objects</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
o = c.<span style="color: black;">get_object</span><span style="color: black;">&#40;</span>l<span style="color: black;">&#91;</span>-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
new_path = <span style="color: #483d8b;">'%s/%s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>another_container, o.<span style="color: black;">name</span><span style="color: black;">&#41;</span>
o.<span style="color: black;">copy_to</span><span style="color: black;">&#40;</span>new_path<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'copied'</span>, l<span style="color: black;">&#91;</span>-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>, <span style="color: #483d8b;">'to'</span>, new_path
new_list = conn.<span style="color: black;">get_container</span><span style="color: black;">&#40;</span>another_container<span style="color: black;">&#41;</span>.<span style="color: black;">list_objects</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> new_list
<span style="color: #ff7700;font-weight:bold;">assert</span> o.<span style="color: black;">name</span> <span style="color: #ff7700;font-weight:bold;">in</span> new_list</pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/cloud-files-object-copy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyGTK Chart widget beta release</title>
		<link>http://programmerthoughts.com/programming/pygtk-chart-widget-beta-release/</link>
		<comments>http://programmerthoughts.com/programming/pygtk-chart-widget-beta-release/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 19:17:42 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[pygtk]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[widget]]></category>

		<guid isPermaLink="false">http://johnandkaren.com/blog/?p=271</guid>
		<description><![CDATA[We released a new version of <a href="http://notmyname.github.com/pygtkChart/">pygtkChart</a> today. This version is a beta release and allows for much more flexibility than the previous version. Some new features include the ability to independently address each part of a chart or graph and the ability to use GTK properties and signals. Mouse events are now supported, and hooks are available to click on individual areas of a chart.]]></description>
			<content:encoded><![CDATA[<p>We released a new version of <a href="http://notmyname.github.com/pygtkChart/">pygtkChart</a> today. This version is a beta release and allows for much more flexibility than the previous version. Some new features include the ability to independently address each part of a chart or graph and the ability to use GTK properties and signals. Mouse events are now supported, and hooks are available to click on individual areas of a chart.</p>
<p>The new version can be downloaded from <a href="http://github.com/notmyname/pygtkChart/downloads">http://github.com/notmyname/pygtkChart/downloads</a>. As always, the latest source can be cloned from git://github.com/notmyname/pygtkChart.git.</p>
]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/pygtk-chart-widget-beta-release/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>PyGTK Chart Widget</title>
		<link>http://programmerthoughts.com/programming/pygtk-chart-widget/</link>
		<comments>http://programmerthoughts.com/programming/pygtk-chart-widget/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 16:11:14 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[pygtk]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[widget]]></category>

		<guid isPermaLink="false">http://johnandkaren.com/blog/?p=250</guid>
		<description><![CDATA[pygtkChart is a chart widget for GTK that offers line graphs and pie charts. It&#8217;s simple to use, but it is lacking one feature that I really wanted: bar charts. I added a bar chart widget to the package, but I have not been able to get in touch with the original author to contribute [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://pygtkchart.sven-festersen.de/">pygtkChart</a> is a chart widget for GTK that offers line graphs and pie charts. It&#8217;s simple to use, but it is lacking one feature that I really wanted: bar charts. I added a bar chart widget to the package, but I have not been able to get in touch with the original author to contribute the code back. So, here it is.</p>
<p><strong>Download:</strong> Clone from <a href="git://github.com/notmyname/pygtkChart.git">git://github.com/notmyname/pygtkChart.git</a> or view the source at <a href="http://github.com/notmyname/pygtkChart/tree">http://github.com/notmyname/pygtkChart/tree</a></p>
<p><strong>Installation:</strong> $ python setup.py build &#038;&#038; sudo python setup.py install</p>
<p><strong>Description:</strong> I have added two new classes: BarChart and MultiBarChart. BarChart provides a simple bar chart. MultiBarChart allows for grouped bars. The code is fairly well commented and should be easy to follow.</p>
<div id="attachment_252" class="wp-caption aligncenter" style="width: 310px"><a href="http://c1912352.cdn.cloudfiles.rackspacecloud.com/2009/06/bar.png"><img src="http://c1912352.cdn.cloudfiles.rackspacecloud.com/2009/06/bar-300x150.png" alt="BarChart example" title="bar" width="300" height="150" class="size-medium wp-image-252" /></a><p class="wp-caption-text">BarChart example</p></div><br />
<div id="attachment_253" class="wp-caption aligncenter" style="width: 310px"><a href="http://c1912352.cdn.cloudfiles.rackspacecloud.com/2009/06/multibar.png"><img src="http://c1912352.cdn.cloudfiles.rackspacecloud.com/2009/06/multibar-300x150.png" alt="MultiBarChart example" title="multibar" width="300" height="150" class="size-medium wp-image-253" /></a><p class="wp-caption-text">MultiBarChart example</p></div>
<p>These images are screenshots of bar_chart_test.py and multi_bar_chart_test.py, both found in <a href='http://programmerthoughts.com/bar_chart_test.tgz'>bar_chart_test.tgz</a></p>
<p><strong>UPDATE:</strong> a <a href="http://programmerthoughts.com/programming/pygtk-chart-widget-beta-release/">new version</a> has been released</p>
]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/pygtk-chart-widget/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Three tier python imports</title>
		<link>http://programmerthoughts.com/programming/three-tier-python-imports/</link>
		<comments>http://programmerthoughts.com/programming/three-tier-python-imports/#comments</comments>
		<pubDate>Sat, 28 Feb 2009 18:32:16 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[import]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[three tier]]></category>

		<guid isPermaLink="false">http://johnandkaren.com/blog/?p=168</guid>
		<description><![CDATA[Suppose you want to write a python module, and, like a good software developer, you want to keep a separation between your development, model/testing, and production versions of your code. This is a simple task with a few tricks in __init__.py. First, our tester code: 1 2 3 import module x = module.test.Klass&#40;&#41; print x.status&#40;&#41; [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you want to write a python module, and, like a good software developer, you want to keep a separation between your development, model/testing, and production versions of your code. This is a simple task with a few tricks in __init__.py.</p>
<p>First, our tester code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> module
x = module.<span style="color: #dc143c;">test</span>.<span style="color: black;">Klass</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> x.<span style="color: black;">status</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>module/_dev/test.py is similarly simple (the model and prod versions respectively return &#8220;model&#8221; and &#8220;prod&#8221; instead of &#8220;dev&#8221;):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> Klass<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> status<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">'%s in dev'</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>module/__init__.py:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
&nbsp;
env = <span style="color: #dc143c;">os</span>.<span style="color: black;">environ</span>.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ENV'</span>,<span style="color: #483d8b;">'dev'</span><span style="color: black;">&#41;</span>.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> env == <span style="color: #483d8b;">'prod'</span>:
    <span style="color: #ff7700;font-weight:bold;">from</span> _prod <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #66cc66;">*</span>
<span style="color: #ff7700;font-weight:bold;">elif</span> env == <span style="color: #483d8b;">'model'</span>:
    <span style="color: #ff7700;font-weight:bold;">from</span> _model <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #66cc66;">*</span>
<span style="color: #ff7700;font-weight:bold;">else</span>:
    <span style="color: #ff7700;font-weight:bold;">from</span> _dev <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #66cc66;">*</span></pre></td></tr></table></div>

<p>__init__.py in _dev, _model, and _prod:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> _load_code<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    parent_dir = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">dirname</span><span style="color: black;">&#40;</span>__file__<span style="color: black;">&#41;</span>
    base = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">basename</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">dirname</span><span style="color: black;">&#40;</span>parent_dir<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    modules = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> root, dirs, files <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">walk</span><span style="color: black;">&#40;</span>parent_dir<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> name <span style="color: #ff7700;font-weight:bold;">in</span> files:
        <span style="color: #ff7700;font-weight:bold;">if</span> name.<span style="color: black;">endswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.py'</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
          <span style="color: #ff7700;font-weight:bold;">not</span> name.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'__'</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
          <span style="color: #ff7700;font-weight:bold;">not</span> name.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.'</span><span style="color: black;">&#41;</span>:
            module_name = <span style="color: #483d8b;">'%s.%s.%s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>base,
                        <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">basename</span><span style="color: black;">&#40;</span>parent_dir<span style="color: black;">&#41;</span>,
                        name.<span style="color: black;">rsplit</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.'</span>, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">os</span>.<span style="color: black;">sep</span>, <span style="color: #483d8b;">'.'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">__import__</span><span style="color: black;">&#40;</span>module_name<span style="color: black;">&#41;</span>
            modules.<span style="color: black;">append</span><span style="color: black;">&#40;</span>module_name<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> modules
&nbsp;
__all__ = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> update__all__<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">global</span> __all__
    __all__ = <span style="color: black;">&#91;</span>x.<span style="color: black;">rsplit</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.'</span>,<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> _load_code<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
update__all__<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># intial setup</span></pre></td></tr></table></div>

<p>When tester.py is run, it checks the environment variable ENV and imports the proper version of the file.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #007800;">ENV</span>=dev python .<span style="color: #000000; font-weight: bold;">/</span>tester.py
<span style="color: #000000; font-weight: bold;">&lt;</span>module._dev.test.Klass object at 0x9d66e2c<span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">in</span> dev
$ <span style="color: #007800;">ENV</span>=model python .<span style="color: #000000; font-weight: bold;">/</span>tester.py
<span style="color: #000000; font-weight: bold;">&lt;</span>module._model.test.Klass object at 0x9435e2c<span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">in</span> model
$ <span style="color: #007800;">ENV</span>=prod python .<span style="color: #000000; font-weight: bold;">/</span>tester.py
<span style="color: #000000; font-weight: bold;">&lt;</span>module._prod.test.Klass object at 0x9f12e2c<span style="color: #000000; font-weight: bold;">&gt;</span> <span style="color: #000000; font-weight: bold;">in</span> prod</pre></div></div>

<p><a href="http://johnandkaren.com/blog/wp-content/uploads/2009/02/import_test.tgz">Full source code</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://programmerthoughts.com/programming/three-tier-python-imports/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
