Compressed File Reader

April 15, 2010 at 10:19 pm
filed under programming
Tagged ,

Recently, I had need of streaming compressed data from an uncompressed file without buffering the entire file in memory. Using Python’s gzip library would require me to create a new file on disk. The zlib module offers streaming, but it does not produce the gzip headers. I wanted something that would produce gzip-compatible output in a streaming fashion.

To solve this, I wrote a class that wraps a file object and provides a read method to generate the compressed data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class CompressedFileReader(object):
    def __init__(self, file_obj, compresslevel=9):
        self._f = file_obj
        self._compressor = zlib.compressobj(compresslevel,
                                            zlib.DEFLATED,
                                            -zlib.MAX_WBITS,
                                            zlib.DEF_MEM_LEVEL,
                                            0)
        self.done = False
        self.first = True
        self.crc32 = 0
        self.total_size = 0
 
    def read(self, *a, **kw):
        if self.done:
            return ''
        x = self._f.read(*a, **kw)
        if x:
            self.crc32 = zlib.crc32(x, self.crc32) & 0xffffffffL
            self.total_size += len(x)
            compressed = self._compressor.compress(x)
            if not compressed:
                compressed = self._compressor.flush(zlib.Z_SYNC_FLUSH)
        else:
            compressed = self._compressor.flush(zlib.Z_FINISH)
            crc32 = struct.pack("<L", self.crc32 & 0xffffffffL)
            size = struct.pack("<L", self.total_size & 0xffffffffL)
            footer = crc32 + size
            compressed += footer
            self.done = True
        if self.first:
            self.first = False
            header = '\037\213\010\000\000\000\000\000\002\377'
            compressed = header + compressed
        return compressed

This code, with some simple tests and examples, is available on github: http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py

One potential use case is streaming compressed data to a web service. For example, one could use this class to compress data as it is streamed to cloud files.

1
2
3
4
5
6
7
conn = cloudfiles.get_connection(username, apikey)
container = conn.create_container('some_container')
test_object = container.create_object('file.gz')
test_object.content_type = 'application/x-gzip'
with open('path/to/large/uncompressed/file', 'rb') as f:
    compressed_f = CompressedFileReader(f)
    test_object.write(compressed_f)

1 comment

RSS / trackback

respond

  1. Tweets that mention Compressed File Reader -- Topsy.com

    on April 27, 2010 at 11:43 pm

    [...] This post was mentioned on Twitter by john. john said: streaming a file as compressed data http://programmerthoughts.com/programming/compressed-file-reader/ [...]