Recently, I had need of streaming compressed data from an uncompressed file without buffering the entire file in memory. Using Python’s gzip library would require me to create a new file on disk. The zlib module offers streaming, but it does not produce the gzip headers. I wanted something that would produce gzip-compatible output in a streaming fashion.
To solve this, I wrote a class that wraps a file object and provides a read method to generate the compressed data.
class CompressedFileReader(object):
def __init__(self, file_obj, compresslevel=9):
self._f = file_obj
self._compressor = zlib.compressobj(compresslevel,
zlib.DEFLATED,
-zlib.MAX_WBITS,
zlib.DEF_MEM_LEVEL,
0)
self.done = False
self.first = True
self.crc32 = 0
self.total_size = 0
def read(self, *a, **kw):
if self.done:
return ''
x = self._f.read(*a, **kw)
if x:
self.crc32 = zlib.crc32(x, self.crc32) & 0xffffffffL
self.total_size += len(x)
compressed = self._compressor.compress(x)
if not compressed:
compressed = self._compressor.flush(zlib.Z_SYNC_FLUSH)
else:
compressed = self._compressor.flush(zlib.Z_FINISH)
crc32 = struct.pack("<L", self.crc32 & 0xffffffffL)
size = struct.pack("<L", self.total_size & 0xffffffffL)
footer = crc32 + size
compressed += footer
self.done = True
if self.first:
self.first = False
header = '\037\213\010\000\000\000\000\000\002\377'
compressed = header + compressed
return compressed
This code, with some simple tests and examples, is available on github: http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py.
One potential use case is streaming compressed data to a web service. For example, one could use this class to compress data as it is streamed to cloud files.
1 conn = cloudfiles.get_connection(username, apikey)
2 container = conn.create_container('some_container')
3 test_object = container.create_object('file.gz')
4 test_object.content_type = 'application/x-gzip'
5 with open('path/to/large/uncompressed/file', 'rb') as f:
6 compressed_f = CompressedFileReader(f)
7 test_object.write(compressed_f)
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The thoughts expressed here are my own and do not necessarily represent those of my employer.