Rank: 6929 based on downloads (last 30 days) and 0 files submitted
photo

Sebastiaan

E-mail
Company/University
ErasmusMC

Personal Profile:

Professional Interests:

 

Watch this Author's files

 

Comments and Ratings by Sebastiaan View all
Updated File Comments Rating
19 Nov 2009 Compression Routines Compress Matlab variables in the workspace. (supports cells, structs, matrices, strings, objects) Author: Jesse Hopkins

Returning on the issue for compressing large matrices, I made the following patch. It chunks byteData into blocks of 5MiB, and write the output to a structure, which has some information about chunksize and the size of the uncompressed byte array.

The amount of heap space available for compression is rather unpredictable. Sometimes 10MiB blocks were too large. The command 'java.lang.Runtime.getRuntime.freeMemory' does not return a usable value either.

The use of a structure produces an extra overhead of 736 bytes compared to your current version. This can be significantly reduced to 24 bytes if the blocksize/uncompressed size information is stored in the byte array as well. However, I found this method more clear, and the overhead is minimal for larger data, which cannot be compressed currently.

The CompressLib.test shows that all compressions were succesful.

The only thing which cannot be compressed now are sparse matrices. (I have got something working, but I have no idea how to get it compiled on windows, so I do not want to submit it on the FX now.)

Many thanks for your work! It helped me solving a lot of memory issues.

Sebastiaan

Patch:
diff old/CompressionLib/CompressLib.m new/CompressionLib/CompressLib.m
43c43
< function out = decompress(byteArray)
---
> function out = decompress(compressedData)
46c46
< % out = CompressLib.decompress(byteArray)
---
> % out = CompressLib.decompress(compressedData)
48,49c48,49
< % Function will decompress "byteArray" (created by CompressLib.compress).
< % "byteArray" must be a 1-D array of bytes (uint8).
---
> % Function will decompress "compressedData" (created by CompressLib.compress).
> % "compressedData" must be a compression structure.
59,71c59,77
< if ~strcmpi(class(byteArray),'uint8') || ndims(byteArray) > 2 || min(size(byteArray) ~= 1)
< error('Input must be a 1-D array of uint8');
< end
<
< %------Decompress byte-array "byteArray" to "byteData" using java methods------
< a=java.io.ByteArrayInputStream(byteArray);
< b=java.util.zip.GZIPInputStream(a);
< isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
< c = java.io.ByteArrayOutputStream;
< isc.copyStream(b,c);
< byteData = typecast(c.toByteArray,'uint8');
< %----------------------------------------------------------------------
<
---
> if isstruct(compressedData) && ~isfield(compressedData, 'compressed') && ~isequal(compressedData.compressed, 'GZIP')
> error('Input must be a compression structure.');
> end
>
> % Reserve memory
> byteData = zeros(compressedData.UncompressedSize, 1, 'uint8');
>
> % Decompress data in chunks
> for Iter=1:length(compressedData.Data)
> %------Decompress byte-array "byteArray" to "byteData" using java methods------
> a=java.io.ByteArrayInputStream(compressedData.Data{Iter});
> b=java.util.zip.GZIPInputStream(a);
> isc = InterruptibleStreamCopier.getInterruptibleStreamCopier;
> c = java.io.ByteArrayOutputStream;
> isc.copyStream(b,c);
> byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, length(byteData))) = typecast(c.toByteArray,'uint8');
> %----------------------------------------------------------------------
> end
>
74d79
< end
76c81,83
< function byteArray = compress(in)
---
> end
>
> function compressedData = compress(in)
79c86
< % byteArray = CompressLib.compress(in)
---
> % compressedData = CompressLib.compress(in)
88c95
< % Outputs an array of type uint8. Use CompressLib.decomress to decompress
---
> % Outputs a compression structure. Use CompressLib.decomress to decompress
98,106c105,120
< %-------compress the array of bytes using java GZIP--------------------
< f=java.io.ByteArrayOutputStream();
< g=java.util.zip.GZIPOutputStream(f);
< g.write(byteData);
< g.close;
< byteArray=typecast(f.toByteArray,'uint8');
< f.close;
< %----------------------------------------------------------------------
< end
---
> % Compress data in chunks
> compressedData.compressed = 'GZIP';
> compressedData.BlockSize = 5*1024^2; % Make 5 MiB chunks
> compressedData.UncompressedSize = length(byteData);
> compressedData.Data = cell(ceil(compressedData.UncompressedSize/compressedData.BlockSize), 1);
> for Iter = 1:length(compressedData.Data)
> %-------compress the array of bytes using java GZIP--------------------
> f=java.io.ByteArrayOutputStream();
> g=java.util.zip.GZIPOutputStream(f);
> g.write(byteData((Iter-1)*compressedData.BlockSize+1:min(Iter*compressedData.BlockSize, compressedData.UncompressedSize)));
> g.close;
> compressedData.Data{Iter}=typecast(f.toByteArray,'uint8');
> f.close;
> %----------------------------------------------------------------------
> end
> end

06 Nov 2009 Compression Routines Compress Matlab variables in the workspace. (supports cells, structs, matrices, strings, objects) Author: Jesse Hopkins

Well, I started to think about that yesterday, to chop my matrix into smaller blocks by octree indexing or maybe just a simple block approach since I know that the data is cluttered together and large blocks are 0 (and then compress these smaller blocks).

I wonder which function is used to compress variables before writing them to a MAT file. If the contents could be written directly to a variable in stead of a file, this would get rid of the size limit of the java function.

05 Nov 2009 Compression Routines Compress Matlab variables in the workspace. (supports cells, structs, matrices, strings, objects) Author: Jesse Hopkins

Nice utility, but it fails for large matrices. I have a single matrix of 514x435x217 consuming 190MB. Trying to compress it gives a heap space error:

??? Java exception occurred:
java.lang.OutOfMemoryError: Java heap space

Error in ==> CompressLib>CompressLib.compress at 101
g.write(byteData);
 

04 Nov 2009 xml_io_tools Read XML files into MATLAB struct and writes MATLAB data types to XML Author: Jaroslaw Tuszynski

Mark: have you tried using the Pref.Str2Num=false with xml_read?

23 Oct 2009 xml_io_tools Read XML files into MATLAB struct and writes MATLAB data types to XML Author: Jaroslaw Tuszynski

Hmm, having problems commenting. This is what the question used to be:

VERY nice tool indeed, especially the Pref settings which let you modify the output. However, I have a problem with empty tags or contents. My XML file has lines like this:

<?xml version="1.0" encoding="ISO-8859-1" ?><!DOCTYPE PD>
<PD version="4.1">
    <Misc>
        <property value="Comments"> </property>
    </Misc>
    <EmptyTag>
    </EmptyTag>
</PD>

Which is read in Matlab as:
tree =
         Misc: [1x1 struct]
     EmptyTag: []
    ATTRIBUTE: [1x1 struct]

tree.Misc.property =
      CONTENT: []
    ATTRIBUTE: [1x1 struct]

So far, so good. However, on writing, the XML file is malformed:
<?xml version="1.0" encoding="UTF-8"?>
<PD version="4.1">
    <Misc>
        <property value="Comments"/>
    </Misc>
    <EmptyTag/>
</PD>

I do not understand the DOM structure enough to try to find a cure for this. Also, setting tree.EmptyTag=' ' (one or more spaces) results in the same output.

How can this be corrected?

 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com