Code covered by the BSD License  

Highlights from
Compression Routines

3.0

3.0 | 1 rating Rate this file 16 Downloads (last 30 days) File Size: 7.77 KB File ID: #25656
image thumbnail

Compression Routines

by

 

26 Oct 2009 (Updated )

Compress Matlab variables in the workspace. (supports cells, structs, matrices, strings, objects)

| Watch this File

File Information
Description

This Matlab class contains only static methods.
These methods will compress matlab variables using java GZIP functions.
Matrices, strings, structures, and cell arrays are supported. Matlab
objects are also supported provided they implement a "toByteArray" method
and a constructor of the form:
         obj = constructor(byteArray,'PackedBytes')

Usage:
  x = % some matlab variable,
      % can be a struct, cell-array,
      % matrix, or object (if object conforms to certain standards)

  cx = CompressLib.compress(x); % cx is a byte-array which contains
                    % compressed version of x

  x2 = CompressLib.decompress(cx); %x2 is now a copy of x

The methods CompressLib.packBytes and CompressLib.unpackBytes are also
available. These methods may be used to make Matlab classes compliant
for the compression routines. See "packableObject.m" for an example
Matlab class which is compliant with CompressLib.

Use CompressLib.test to run a series of tests for functionality of this
class.

These methods run pretty fast on numeric variables, however slow down more on structures, cell-arrays and objects. The biggest bottleneck is "typecast.m". You can vastly improve the performance by replacing these calls to typecast.m with the mex counterpart typecastc.mexw32, located in "matlabroot\toolbox\matlab\datatypes\private". Since it is in a private directory, you would have to make a copy of it somewhere else on your path in order for it to be visible publicly. (I noticed a ~37% improvement when running the test suite (CompressLib.test) by doing this.)

Acknowledgements

Rapid Lossless Data Compression Of Numerical Or String Variables and Sizeof inspired this file.

This file inspired Sparse Pack.

MATLAB release MATLAB 7.7 (R2008b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (11)
25 Oct 2012 Jesse Hopkins

Hamid, glad you found this useful. Java is used to implement the compression, you can see this pretty clearly in the "compress" method. I haven't tried it, but I think it will work change the line "g = java.util.zip.GZIPOutputStream(f)" to "g = java.util.zip.ZipOutputStream(f)".

30 Aug 2012 Hamid

Thanks for sharing this.
Is there a way I can compress/decompress using ZIP format (as opposed to GZIP)?

08 Jun 2011 Jesse Hopkins

I saw similar results as Sebastiaan. My usage is quite different, as in my environment this is used to compress and decompress many small matlab structures stored within hundreds Simulink blocks (as userdata) one at a time. There was no noticeable speed improvement, as the time was dominated by the GZIP method.

13 May 2011 Sebastiaan

Thanks for the suggestion. I have tried it with my version (which chunks the data in 5 MiB blocks to prevent running out of heap space), but the speed-up is not measurable. Compressing ~120 MiB takes 0.035 seconds for the custom typecast and 0.161 for the built-in function. In contrast, the java GZIP function takes 7 seconds (and uses only 1 thread).

However, sharing pointers is of course a much nicer solution.

11 May 2011 Jesse Hopkins

Thanks for the suggestion Jan, I'll take a look into that.

11 May 2011 Jan Simon

Using James Tursa's TYPECASTX will increase the speed, because it creates shared data copies instead of deep copies: http://www.mathworks.com/matlabcentral/fileexchange/17476-typecast-and-typecastx-c-mex-functions

31 May 2010 Zohar Bar-Yehuda

See this article about how to increase the Java heap space:
http://www.mathworks.com/support/solutions/en/data/1-18I2C/index.html

19 Nov 2009 Sebastiaan

Returning on the issue for compressing large matrices, I made the following patch. It chunks byteData into blocks of 5MiB, and write the output to a structure, which has some information about chunksize and the size of the uncompressed byte array.

The amount of heap space available for compression is rather unpredictable. Sometimes 10MiB blocks were too large. The command 'java.lang.Runtime.getRuntime.freeMemory' does not return a usable value either.

The use of a structure produces an extra overhead of 736 bytes compared to your current version. This can be significantly reduced to 24 bytes if the blocksize/uncompressed size information is stored in the byte array as well. However, I found this method more clear, and the overhead is minimal for larger data, which cannot be compressed currently.

The CompressLib.test shows that all compressions were succesful.

The only thing which cannot be compressed now are sparse matrices. (I have got something working, but I have no idea how to get it compiled on windows, so I do not want to submit it on the FX now.)

Many thanks for your work! It helped me solving a lot of memory issues.

Sebastiaan

Patch:
diff old/CompressionLib/CompressLib.m new/CompressionLib/CompressLib.m
43c43
< function out = decompress(byteArray)
---
> function out = decompress(compressedData)
46c46
< % out = CompressLib.decompress(byteArray)
---
> % out = CompressLib.decompress(compressedData)
48,49c48,49
< % Function will decompress "byteArray" (created by CompressLib.compress).
< % "byteArray" must be a 1-D array of bytes (uint8).
---
> % Function will decompress "compressedData" (created by CompressLib.compress).
> % "compressedData" must be a compression structure.
59,71c59,77
< if ~strcmpi(class(byteArray),'uint8') || ndims(byteArray) > 2 || min(size(byteArray) ~= 1)
< error('Input must be a 1-D array of uint8');
< end
<
< %------Decompress byte-array "byteArray" to "byteData" using java methods------
< a=java.io.ByteArrayInputStream(byteArray);
< b=java.util.zip.GZIPInputStream(a);
< isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
< c = java.io.ByteArrayOutputStream;
< isc.copyStream(b,c);
< byteData = typecast(c.toByteArray,'uint8');
< %----------------------------------------------------------------------
<
---
> if isstruct(compressedData) && ~isfield(compressedData, 'compressed') && ~isequal(compressedData.compressed, 'GZIP')
> error('Input must be a compression structure.');
> end
>
> % Reserve memory
> byteData = zeros(compressedData.UncompressedSize, 1, 'uint8');
>
> % Decompress data in chunks
> for Iter=1:length(compressedData.Data)
> %------Decompress byte-array "byteArray" to "byteData" using java methods------
> a=java.io.ByteArrayInputStream(compressedData.Data{Iter});
> b=java.util.zip.GZIPInputStream(a);
> isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
> c = java.io.ByteArrayOutputStream;
> isc.copyStream(b,c);
> byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, length(byteData))) = typecast(c.toByteArray,'uint8');
> %----------------------------------------------------------------------
> end
>
74d79
< end
76c81,83
< function byteArray = compress(in)
---
> end
>
> function compressedData = compress(in)
79c86
< % byteArray = CompressLib.compress(in)
---
> % compressedData = CompressLib.compress(in)
88c95
< % Outputs an array of type uint8. Use CompressLib.decomress to decompress
---
> % Outputs a compression structure. Use CompressLib.decomress to decompress
98,106c105,120
< %-------compress the array of bytes using java GZIP--------------------
< f=java.io.ByteArrayOutputStream();
< g=java.util.zip.GZIPOutputStream(f);
< g.write(byteData);
< g.close;
< byteArray=typecast(f.toByteArray,'uint8');
< f.close;
< %----------------------------------------------------------------------
< end
---
> % Compress data in chunks
> compressedData.compressed = 'GZIP';
> compressedData.BlockSize = 5*1024^2; % Make 5 MiB chunks
> compressedData.UncompressedSize = length(byteData);
> compressedData.Data = cell(ceil(compressedData.UncompressedSize/compressedData.BlockSize), 1);
> for Iter = 1:length(compressedData.Data)
> %-------compress the array of bytes using java GZIP--------------------
> f=java.io.ByteArrayOutputStream();
> g=java.util.zip.GZIPOutputStream(f);
> g.write(byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, compressedData.UncompressedSize)));
> g.close;
> compressedData.Data{Iter}=typecast(f.toByteArray,'uint8');
> f.close;
> %----------------------------------------------------------------------
> end
> end

06 Nov 2009 Sebastiaan

Well, I started to think about that yesterday, to chop my matrix into smaller blocks by octree indexing or maybe just a simple block approach since I know that the data is cluttered together and large blocks are 0 (and then compress these smaller blocks).

I wonder which function is used to compress variables before writing them to a MAT file. If the contents could be written directly to a variable in stead of a file, this would get rid of the size limit of the java function.

05 Nov 2009 Jesse Hopkins

Wow I never did try it with any single matrix that large. You could probably still save much memory by splitting up that large matrix, perhaps compress the 514x435 2-D matrices, so that you have 217 compressed variables.

CompressLib could probably get some smarts to compress the input in "chunks", but I probably won't be able to get around to that for a while.

05 Nov 2009 Sebastiaan

Nice utility, but it fails for large matrices. I have a single matrix of 514x435x217 consuming 190MB. Trying to compress it gives a heap space error:

??? Java exception occurred:
java.lang.OutOfMemoryError: Java heap space

Error in ==> CompressLib>CompressLib.compress at 101
g.write(byteData);

Updates
27 Oct 2009

Minor updates to help comments in attached m-files

28 Oct 2009

Changed to a class w/ static methods. Biggest functional change is that packBytes now pre-allocates byteArray. Other changes include better handling of class-names,enumerated values, and element sizes. Also added comprehensive test suite.

29 Oct 2009

Last update for a while.. promise. Some general cleanup, improvements based on profiler results.

26 Jul 2010

Uploaded screenshot

Contact us