Get MD5 of files or arrays as fast C-Mex



MD5 hash of files, strings and arrays (RFC 1321)
Hash = GetMD5(Data, Mode, Format)
Data: File name or array.
Mode: String to declare the type of the 1st input.
'File': Data is a file name as string.
'8Bit': If Data is a CHAR array, only the 8 bit ASCII part is used.
'Binary': The MD5 sum is obtained for the contents of Data.
This works for numerical, CHAR and LOGICAL arrays.
'Array': Include the class and dimensions of Data in the MD5
sum. This can be applied for (nested) structs, cells
and sparse arrays also.
Format: String, format of the output: hex, HEX, double, uint8, base64

Hash: A 128 bit number is replied in the specified format.
This function is at least 2 times faster than the corresponding Java method.
For shorter arrays this C-Mex implementation is much faster, see the output
of the included unit test function.
The function DataHash can reply SHA hashes also, but it is remarkably
slower due to the overhead of calling Java. For nested structs this C-Mex
can be 100 times faster. See:

The C-function must be compiled before using: Call the M-function without
inputs. If you do not have a C-compiler, pre-compiled files can be

Jan Simon

Jan Simon (view profile)

@Chris: GetMD5('test.mat') replies the hash for the string 'test.mat', which is a Matlab CHAR with 16 bits per characters. The other tools determine the hash for a file with the name 'test.mat'. To get this, use GetMD5('test.mat', 'file'), as explained in the documentation.


Chris (view profile)

PS: I found a solution on SO that works for me (, so the problem seems to be something other than my system being completely broken! (I never rule out that possibility...).


Chris (view profile)

Hi Jan, thank you for this function. I am having trouble on Windows 7 64-bit, Matlab version (R2016a). The hash your program generates does not match what I get from CertUtil in Windows or md5sum under Linux. Am I misunderstanding the functionality?

>> GetMD5('test.mat')
ans =

>> system('CertUtil -hashfile test.mat MD5')
MD5 hash of file test.mat:
7e 15 65 8c b5 9c a6 cf fe 8f cf 25 e0 9a bf f7
CertUtil: -hashfile command completed successfully.

And the Windows system call agrees with md5sum on Ubuntu 16.04:

➜ ~ cd /media/sf_data
➜ sf_data md5sum test.mat
7e15658cb59ca6cffe8fcf25e09abff7 test.mat

Jan Simon

Jan Simon (view profile)

@Atiya: Please explain the problems with any details: Do you have a C compliler installed before? Do you get an error message during compiling or calling? Did you download the pre-compiled MEX function? You will find a contact address in the help section of the code for more detailed descriptions.

Atiya Usmani

I cannot install mex successfully

Jesse Ogle

on Linux I had to change line 146
# define _LITTLE_ENDIAN 1

Compilation on MacOS X 10.11 with clang fails because of this line: #define __STDC_WANT_LIB_EXT1__
Removing it makes it compile.


Egor (view profile)

Thanks, usefull tool!

Is there any progress on the support for large (> 2^31 bytes) variables? This would be very useful to me.


Justin (view profile)


Philip (view profile)

Great job, thanks!


Glad to hear :)

To make it 'public': Linux users compile with this:
mex CFLAGS="\$CFLAGS -std=c99" CalcMD5.c

Jan Simon

Jan Simon (view profile)

Thanks Sebastiaan! The 64-bit bug is fixed (since 17-Dec-2009). Full 64-bit support is under development.
I like the C99 comment style with //, so your suggestion to add "-std=C99" for the GCC compiler works well.


Nice functionality, but I have problems using this under Linux, and there is a bug:

First off, it is a C file with C++ style commenting: don't do that - users have to manually disable the -ansi option from their CFLAGS, which is on by default.

The bug is on 64 bit systems, where you assume that unsigned long int is 4 bytes - which is actually 8.

For correct implementation, replace line 104:
typedef unsigned long int UINT32; /* four byte word */
typedef UINT32_T UINT32; /* four byte word */

Using UINT32_T from tmwtypes.h assures 4 bytes on each system.

If you want, I can send you the updated file with C style commenting.

I am looking forward to the version which handles large files and arrays. Maybe I will make one in the future, but I know that LCC is not too happy with long long ints.

Great work! Compiling the mex went seamlessly. I used this to get some version control independent from buggy datenum.



Fixed problem with compiling under MacOS X.


Improved speed, the class and size of arrays can be considered, cell and struct arrays accepted. Now arrays and files > 2.1GB are processed.


UINT32 has 4 bytes on 64-bit systems now. Thanks to Sebastiaan (34534)!

MATLAB 8.6 (R2015b)

Inspired by: MD5 in MATLAB

Inspired: JavaMD5, bimac/md5sum, DataHash

