RUNLENGTH - Run-length coding
Run-length encoding splits a vector into one vector, which contains the
elements without neighboring repetitions, and a second vector, which
contains the number of repetitions.
This can reduce the memory for storing the data or allow to analyze sequences.
Encoding: [B, N, BI] = RunLength(X)
Decoding: X = RunLength(B, N)
INPUT / OUTPUT:
X: Full input signal, row or column vector.
Types: (U)INT8/16/32/64, SINGLE, DOUBLE, LOGICAL, CHAR.
B: Compressed data, neighboring elements with the same value are removed.
B and X have the same types.
N: Number of repetitions of the elements of B in X as DOUBLE or UINT8 row vector.
BI: Indices of elements in B in X as DOUBLE row vector.
RunLength(X, 'byte') replies N as UINT8 vector.
You can find a lot of RLE tools in the FileExchange already. This C-Mex is
about 5 times faster than good vectorized M-versions.
The M-file RunLength_M contains vectorized and loop M-code for education.
Encode and decode:
[b, n] = RunLength([8, 9, 9, 10, 10, 10, 11])
x = RunLength(b, n)
% b = [8,9,10,11], n = [1,2,3,1], x = [8,9,9,10,10,10,11]
Limit counter to 255:
[b, n] = RunLength(ones(1, 257), 'byte')
% b = [1, 1], n = uint8([255, 2])
[b, n] = RunLength([true(257, 1); false])
% b = [true; false], n = [257, 1]
Find the longest sequence:
x = floor(rand(1, 1e6) * 2);
[b, n, bi] = RunLength(x);
[longestRun, index] = max(n);
longestPos = bi(index);
The C-code is compiled automatically the first time RunLength is called.
See "RunLength_ReadMe.txt" for more details.
The unit-test uTest_RunLength tests validity and speed.
Tested: Matlab 6.5, 7.7, 7.8, 7.13, WinXP/32, Win7/64
Compiler: LCC3.8, BCC5.5, OWC1.8, MSVC2008/2010
Does not compile under LCC2.4 shipped with Matlab/32!
Assumed Compatibility: higher Matlab versions, Linux, MacOS.
@cyclist: This was a lazy choice without intention. I will adjust this in the next days and implement a faster way to obtain the indices BI. Thanks for this suggestion.
Very useful utility that I use pretty often, and recommend regularly on the Answers forum.
Now that MATLAB has implemented implicit expansion, I have had unexpected behavior because b is Nx1 (for input x with N runs), while n and bi are 1xN. Is there a reason for that choice?
just what I was looking for!
very well documented and easy to use
Excellent. Well commented. Lots of input checking. Variable naming could be better, but people probably say that about my code.
Thanks for the comments, Oleg. I'm going to enhance InstallMex to let a dialog box appear to let the user decide, if this is worth a warning or an error.
The 2nd input N was thought to be created by a former call to RunLength with 1 input argument. But there is actually no good reason to restrict the types of the 2nd input. It is only inconvenient to care about all different types of inputs inside the C-Mex: The only efficient approach is to create a subfunction for each type, because an internal conversion must consider e.g. overflows of UINT64 data when converted to DOUBLE. So I decided to let the user decide if "RunLength(B, double(N))" is sufficient. But for (U)INT16/32 this problem cannot occur and three extra lines of code are sufficient. Therefore I will add this in the next version.
An additional comment, is there any reason why the second input cannot be uint16, uint32 or single?
I have been using Urs' milestone http://www.mathworks.co.uk/matlabcentral/fileexchange/6436-rude-a-pedestrian-run-length-decoder-encoder for years now and I am pleasantly making the switch to this superb contribution.
RunLength decoding/encoding as fast as it gets!
Also, thanks to InstallMex that is bundled with Jan's submissions, it's a no brainer.
InstallMex errors if a binary is on the some Matlab's path already. However, sometimes I just want to compile for a local project. Maybe the error could be switched to a warning?
Sorry, the first submission missed the file RunLength.inc . A new version is submitted already. Please feel free to contact me, if the compilation of a file does not work.
2nd output is a column vector also, when the input is one. Thanks The Cyclist.
In the former submission the file RunLength.inc was missing. ReadMe added.