April 15, 2010 at 10:19 pm
filed under programming
Tagged compress, python
Recently, I had need of streaming compressed data from an uncompressed file without buffering the entire file in memory. Using Python’s gzip library would require me to create a new file on disk. The zlib module offers streaming, but it does not produce the gzip headers. I wanted something that would produce gzip-compatible output in a streaming fashion.
To solve this, I wrote a class that wraps a file object and provides a read method to generate the compressed data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | class CompressedFileReader(object): def __init__(self, file_obj, compresslevel=9): self._f = file_obj self._compressor = zlib.compressobj(compresslevel, zlib.DEFLATED, -zlib.MAX_WBITS, zlib.DEF_MEM_LEVEL, 0) self.done = False self.first = True self.crc32 = 0 self.total_size = 0 def read(self, *a, **kw): if self.done: return '' x = self._f.read(*a, **kw) if x: self.crc32 = zlib.crc32(x, self.crc32) & 0xffffffffL self.total_size += len(x) compressed = self._compressor.compress(x) if not compressed: compressed = self._compressor.flush(zlib.Z_SYNC_FLUSH) else: compressed = self._compressor.flush(zlib.Z_FINISH) crc32 = struct.pack("<L", self.crc32 & 0xffffffffL) size = struct.pack("<L", self.total_size & 0xffffffffL) footer = crc32 + size compressed += footer self.done = True if self.first: self.first = False header = '\037\213\010\000\000\000\000\000\002\377' compressed = header + compressed return compressed |
This code, with some simple tests and examples, is available on github: http://github.com/notmyname/python_scripts/blob/master/compressed_file_reader_test.py
One potential use case is streaming compressed data to a web service. For example, one could use this class to compress data as it is streamed to cloud files.
1 2 3 4 5 6 7 | conn = cloudfiles.get_connection(username, apikey) container = conn.create_container('some_container') test_object = container.create_object('file.gz') test_object.content_type = 'application/x-gzip' with open('path/to/large/uncompressed/file', 'rb') as f: compressed_f = CompressedFileReader(f) test_object.write(compressed_f) |
Tweets that mention Compressed File Reader -- Topsy.com
on April 27, 2010 at 11:43 pm
[...] This post was mentioned on Twitter by john. john said: streaming a file as compressed data http://programmerthoughts.com/programming/compressed-file-reader/ [...]