3.0

3.0 | 1 rating Rate this file 82 downloads (last 30 days) File Size: 7.77 KB File ID: #25656

Compression Routines

by Jesse Hopkins

 

26 Oct 2009 (Updated 29 Oct 2009)

Code covered by BSD License  

Compress Matlab variables in the workspace. (supports cells, structs, matrices, strings, objects)

Download Now | Watch this File

File Information
Description

This Matlab class contains only static methods.
These methods will compress matlab variables using java GZIP functions.
Matrices, strings, structures, and cell arrays are supported. Matlab
objects are also supported provided they implement a "toByteArray" method
and a constructor of the form:
         obj = constructor(byteArray,'PackedBytes')

Usage:
  x = % some matlab variable,
      % can be a struct, cell-array,
      % matrix, or object (if object conforms to certain standards)

  cx = CompressLib.compress(x); % cx is a byte-array which contains
                    % compressed version of x

  x2 = CompressLib.decompress(cx); %x2 is now a copy of x

The methods CompressLib.packBytes and CompressLib.unpackBytes are also
available. These methods may be used to make Matlab classes compliant
for the compression routines. See "packableObject.m" for an example
Matlab class which is compliant with CompressLib.

Use CompressLib.test to run a series of tests for functionality of this
class.

These methods run pretty fast on numeric variables, however slow down more on structures, cell-arrays and objects. The biggest bottleneck is "typecast.m". You can vastly improve the performance by replacing these calls to typecast.m with the mex counterpart typecastc.mexw32, located in "matlabroot\toolbox\matlab\datatypes\private". Since it is in a private directory, you would have to make a copy of it somewhere else on your path in order for it to be visible publicly. (I noticed a ~37% improvement when running the test suite (CompressLib.test) by doing this.)

Acknowledgements

The author wishes to acknowledge the following in the creation of this submission:
Rapid lossless data compression of numerical or string variables, sizeof
This submission has inspired the following:
SparsePack

MATLAB release MATLAB 7.7 (R2008b)
Zip File Content  
Other Files CompressLib.m,
license.txt,
packableObject.m
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (4)
05 Nov 2009 Sebastiaan

Nice utility, but it fails for large matrices. I have a single matrix of 514x435x217 consuming 190MB. Trying to compress it gives a heap space error:

??? Java exception occurred:
java.lang.OutOfMemoryError: Java heap space

Error in ==> CompressLib>CompressLib.compress at 101
g.write(byteData);
 

05 Nov 2009 Jesse Hopkins

Wow I never did try it with any single matrix that large. You could probably still save much memory by splitting up that large matrix, perhaps compress the 514x435 2-D matrices, so that you have 217 compressed variables.

CompressLib could probably get some smarts to compress the input in "chunks", but I probably won't be able to get around to that for a while.

06 Nov 2009 Sebastiaan

Well, I started to think about that yesterday, to chop my matrix into smaller blocks by octree indexing or maybe just a simple block approach since I know that the data is cluttered together and large blocks are 0 (and then compress these smaller blocks).

I wonder which function is used to compress variables before writing them to a MAT file. If the contents could be written directly to a variable in stead of a file, this would get rid of the size limit of the java function.

19 Nov 2009 Sebastiaan

Returning on the issue for compressing large matrices, I made the following patch. It chunks byteData into blocks of 5MiB, and write the output to a structure, which has some information about chunksize and the size of the uncompressed byte array.

The amount of heap space available for compression is rather unpredictable. Sometimes 10MiB blocks were too large. The command 'java.lang.Runtime.getRuntime.freeMemory' does not return a usable value either.

The use of a structure produces an extra overhead of 736 bytes compared to your current version. This can be significantly reduced to 24 bytes if the blocksize/uncompressed size information is stored in the byte array as well. However, I found this method more clear, and the overhead is minimal for larger data, which cannot be compressed currently.

The CompressLib.test shows that all compressions were succesful.

The only thing which cannot be compressed now are sparse matrices. (I have got something working, but I have no idea how to get it compiled on windows, so I do not want to submit it on the FX now.)

Many thanks for your work! It helped me solving a lot of memory issues.

Sebastiaan

Patch:
diff old/CompressionLib/CompressLib.m new/CompressionLib/CompressLib.m
43c43
< function out = decompress(byteArray)
---
> function out = decompress(compressedData)
46c46
< % out = CompressLib.decompress(byteArray)
---
> % out = CompressLib.decompress(compressedData)
48,49c48,49
< % Function will decompress "byteArray" (created by CompressLib.compress).
< % "byteArray" must be a 1-D array of bytes (uint8).
---
> % Function will decompress "compressedData" (created by CompressLib.compress).
> % "compressedData" must be a compression structure.
59,71c59,77
< if ~strcmpi(class(byteArray),'uint8') || ndims(byteArray) > 2 || min(size(byteArray) ~= 1)
< error('Input must be a 1-D array of uint8');
< end
<
< %------Decompress byte-array "byteArray" to "byteData" using java methods------
< a=java.io.ByteArrayInputStream(byteArray);
< b=java.util.zip.GZIPInputStream(a);
< isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
< c = java.io.ByteArrayOutputStream;
< isc.copyStream(b,c);
< byteData = typecast(c.toByteArray,'uint8');
< %----------------------------------------------------------------------
<
---
> if isstruct(compressedData) && ~isfield(compressedData, 'compressed') && ~isequal(compressedData.compressed, 'GZIP')
> error('Input must be a compression structure.');
> end
>
> % Reserve memory
> byteData = zeros(compressedData.UncompressedSize, 1, 'uint8');
>
> % Decompress data in chunks
> for Iter=1:length(compressedData.Data)
> %------Decompress byte-array "byteArray" to "byteData" using java methods------
> a=java.io.ByteArrayInputStream(compressedData.Data{Iter});
> b=java.util.zip.GZIPInputStream(a);
> isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
> c = java.io.ByteArrayOutputStream;
> isc.copyStream(b,c);
> byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, length(byteData))) = typecast(c.toByteArray,'uint8');
> %----------------------------------------------------------------------
> end
>
74d79
< end
76c81,83
< function byteArray = compress(in)
---
> end
>
> function compressedData = compress(in)
79c86
< % byteArray = CompressLib.compress(in)
---
> % compressedData = CompressLib.compress(in)
88c95
< % Outputs an array of type uint8. Use CompressLib.decomress to decompress
---
> % Outputs a compression structure. Use CompressLib.decomress to decompress
98,106c105,120
< %-------compress the array of bytes using java GZIP--------------------
< f=java.io.ByteArrayOutputStream();
< g=java.util.zip.GZIPOutputStream(f);
< g.write(byteData);
< g.close;
< byteArray=typecast(f.toByteArray,'uint8');
< f.close;
< %----------------------------------------------------------------------
< end
---
> % Compress data in chunks
> compressedData.compressed = 'GZIP';
> compressedData.BlockSize = 5*1024^2; % Make 5 MiB chunks
> compressedData.UncompressedSize = length(byteData);
> compressedData.Data = cell(ceil(compressedData.UncompressedSize/compressedData.BlockSize), 1);
> for Iter = 1:length(compressedData.Data)
> %-------compress the array of bytes using java GZIP--------------------
> f=java.io.ByteArrayOutputStream();
> g=java.util.zip.GZIPOutputStream(f);
> g.write(byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, compressedData.UncompressedSize)));
> g.close;
> compressedData.Data{Iter}=typecast(f.toByteArray,'uint8');
> f.close;
> %----------------------------------------------------------------------
> end
> end

Please login to add a comment or rating.
Updates
27 Oct 2009

Minor updates to help comments in attached m-files

28 Oct 2009

Changed to a class w/ static methods. Biggest functional change is that packBytes now pre-allocates byteArray. Other changes include better handling of class-names,enumerated values, and element sizes. Also added comprehensive test suite.

29 Oct 2009

Last update for a while.. promise. Some general cleanup, improvements based on profiler results.

Tag Activity for this File
Tag Applied By Date/Time
compression java Jesse Hopkins 27 Oct 2009 10:36:33
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com